- developers - lists.mariadb.org

[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter certain kinds of statements CREATION DATE..: Mon, 10 Aug 2009, 15:30 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 41 (http://askmonty.org/worklog/?tid=41) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 13:56)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.10632 2009-08-17 13:56:52.000000000 +0300 +++ /tmp/wklog.41.new.10632 2009-08-17 13:56:52.000000000 +0300 @@ -14,3 +14,10 @@ - Remove all pre-space - Compare the string case-insensitively - etc + +Option 3: + +Server-side support for ignoring certain statements: + + SET SESSION ignored_statements="alter table, analyze table, ..."; + -=-=(Knielsen - Fri, 14 Aug 2009, 14:17)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.6963 2009-08-14 14:17:32.000000000 +0300 +++ /tmp/wklog.41.new.6963 2009-08-14 14:17:32.000000000 +0300 @@ -1,6 +1,11 @@ The implementation will depend on design choices made in WL#40: -- If we decide to parse the statement, SQL-verb filtering will be trivial -- If we decide not to parse the statement, we still can reliably distinguish the + +Option 1: + +If we decide to parse the statement, SQL-verb filtering will be trivial + +Option 2: +If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. If we chose the second, we'll have to perform certain normalization before -=-=(Psergey - Mon, 10 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.13282 2009-08-10 15:47:13.000000000 +0300 +++ /tmp/wklog.41.new.13282 2009-08-10 15:47:13.000000000 +0300 @@ -2,3 +2,10 @@ - If we decide to parse the statement, SQL-verb filtering will be trivial - If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. + +If we chose the second, we'll have to perform certain normalization before +matching the patterns: + - Remove all comments from the command + - Remove all pre-space + - Compare the string case-insensitively + - etc -=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300 +++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300 @@ -1 +1,4 @@ - +The implementation will depend on design choices made in WL#40: +- If we decide to parse the statement, SQL-verb filtering will be trivial +- If we decide not to parse the statement, we still can reliably distinguish the +statement by matching the first characters against a set of patterns. -=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=- Dependency created: 39 now depends on 41 DESCRIPTION: Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax subject to discussion): mysqlbinlog --exclude='alter table,drop table,alter database,...' HIGH-LEVEL SPECIFICATION: The implementation will depend on design choices made in WL#40: Option 1: If we decide to parse the statement, SQL-verb filtering will be trivial Option 2: If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. If we chose the second, we'll have to perform certain normalization before matching the patterns: - Remove all comments from the command - Remove all pre-space - Compare the string case-insensitively - etc Option 3: Server-side support for ignoring certain statements: SET SESSION ignored_statements="alter table, analyze table, ..."; ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter certain kinds of statements CREATION DATE..: Mon, 10 Aug 2009, 15:30 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 41 (http://askmonty.org/worklog/?tid=41) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 13:56)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.10632 2009-08-17 13:56:52.000000000 +0300 +++ /tmp/wklog.41.new.10632 2009-08-17 13:56:52.000000000 +0300 @@ -14,3 +14,10 @@ - Remove all pre-space - Compare the string case-insensitively - etc + +Option 3: + +Server-side support for ignoring certain statements: + + SET SESSION ignored_statements="alter table, analyze table, ..."; + -=-=(Knielsen - Fri, 14 Aug 2009, 14:17)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.6963 2009-08-14 14:17:32.000000000 +0300 +++ /tmp/wklog.41.new.6963 2009-08-14 14:17:32.000000000 +0300 @@ -1,6 +1,11 @@ The implementation will depend on design choices made in WL#40: -- If we decide to parse the statement, SQL-verb filtering will be trivial -- If we decide not to parse the statement, we still can reliably distinguish the + +Option 1: + +If we decide to parse the statement, SQL-verb filtering will be trivial + +Option 2: +If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. If we chose the second, we'll have to perform certain normalization before -=-=(Psergey - Mon, 10 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.13282 2009-08-10 15:47:13.000000000 +0300 +++ /tmp/wklog.41.new.13282 2009-08-10 15:47:13.000000000 +0300 @@ -2,3 +2,10 @@ - If we decide to parse the statement, SQL-verb filtering will be trivial - If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. + +If we chose the second, we'll have to perform certain normalization before +matching the patterns: + - Remove all comments from the command + - Remove all pre-space + - Compare the string case-insensitively + - etc -=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300 +++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300 @@ -1 +1,4 @@ - +The implementation will depend on design choices made in WL#40: +- If we decide to parse the statement, SQL-verb filtering will be trivial +- If we decide not to parse the statement, we still can reliably distinguish the +statement by matching the first characters against a set of patterns. -=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=- Dependency created: 39 now depends on 41 DESCRIPTION: Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax subject to discussion): mysqlbinlog --exclude='alter table,drop table,alter database,...' HIGH-LEVEL SPECIFICATION: The implementation will depend on design choices made in WL#40: Option 1: If we decide to parse the statement, SQL-verb filtering will be trivial Option 2: If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. If we chose the second, we'll have to perform certain normalization before matching the patterns: - Remove all comments from the command - Remove all pre-space - Compare the string case-insensitively - etc Option 3: Server-side support for ignoring certain statements: SET SESSION ignored_statements="alter table, analyze table, ..."; ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to change the used database CREATION DATE..: Fri, 07 Aug 2009, 14:57 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 36 (http://askmonty.org/worklog/?tid=36) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 12:44)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.7834 2009-08-17 12:44:17.000000000 +0300 +++ /tmp/wklog.36.new.7834 2009-08-17 12:44:17.000000000 +0300 @@ -13,7 +13,9 @@ statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). -See also MySQL BUG#42941. +See also MySQL BUG#42941. Note this bug is fixed in MySQL 5.1.37, which is not +merged into MariaDB at the time of writing, but planned to be merged before +release. What we could do ---------------- -=-=(Guest - Sun, 16 Aug 2009, 17:11)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.27162 2009-08-16 17:11:12.000000000 +0300 +++ /tmp/wklog.36.new.27162 2009-08-16 17:11:12.000000000 +0300 @@ -13,6 +13,8 @@ statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). +See also MySQL BUG#42941. + What we could do ---------------- -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.13035 2009-08-10 15:41:51.000000000 +0300 +++ /tmp/wklog.36.new.13035 2009-08-10 15:41:51.000000000 +0300 @@ -1,5 +1,7 @@ Context ------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has a replication slave option --replicate-rewrite-db="from->to" -=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300 +++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300 @@ -1,4 +1,3 @@ - Context ------- At the moment, the server has a replication slave option @@ -67,6 +66,6 @@ It will be possible to do the rewrites either on the slave ( --replicate-rewrite-db will work for all kinds of statements), or in -mysqlbinlog (adding a comment is easy and doesn't require use to parse the -statement). +mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to +parse the statement). -=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300 +++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300 @@ -1 +1,72 @@ +Context +------- +At the moment, the server has a replication slave option + + --replicate-rewrite-db="from->to" + +the option affects +- Table_map_log_event (all RBR events) +- Load_log_event (LOAD DATA) +- Query_log_event (SBR-based updates, with the usual assumption that the + statement refers to tables in current database, so that changing the current + database will make the statement to work on a table in a different database). + +What we could do +---------------- + +Option1: make mysqlbinlog accept --replicate-rewrite-db option +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Make mysqlbinlog accept --replicate-rewrite-db options and process them to the +same extent as replication slave would process --replicate-rewrite-db option. + + +Option2: Add database-agnostic RBR events and --strip-db option +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Right now RBR events require a databasename. It is not possible to have RBR +event stream that won't mention which database the events are for. When I +tried to use debugger and specify empty database name, attempt to apply the +binlog resulted in this error: + +090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on +opening tables, + +We could do as follows: +- Make the server interpret empty database name in RBR event (i.e. in a + Table_map_log_event) as "use current database". Binlog slave thread + probably should not allow such events as it doesn't have a natural current + database. +- Add a mysqlbinlog --strip-db option that would + = not produce any "USE dbname" statements + = change databasename for all RBR events to be empty + +That way, mysqlbinlog output will be database-agnostic and apply to the +current database. +(this will have the usual limitations that we assume that all statements in +the binlog refer to the current database). + +Option3: Enhance database rewrite +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If there is a need to support database change for statements that use +dbname.tablename notation and are replicated as statements (i.e. are DDL +statements and/or DML statements that are binlogged as statements), +then that could be supported as follows: + +- Make the server's parser recognize special form of comments + + /* !database-alias(oldname,newname) */ + + and save the mapping somewhere + +- Put the hooks in table open and name resolution code to use the saved + mapping. + + +Once we've done the above, it will be easy to perform a complete, +no-compromise or restrictions database name change in binary log. + +It will be possible to do the rewrites either on the slave ( +--replicate-rewrite-db will work for all kinds of statements), or in +mysqlbinlog (adding a comment is easy and doesn't require use to parse the +statement). + -=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=- Dependency created: 39 now depends on 36 -=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=- Title modified. --- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300 +++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300 @@ -1 +1 @@ -Add a mysqlbinlog option to change the database +Add a mysqlbinlog option to change the used database DESCRIPTION: Sometimes there is a need to take a binary log and apply it to a database with a different name than the original name of the database on binlog producer. If one is using statement-based replication, he can achieve this by grepping out "USE dbname" statements out of the output of mysqlbinlog(*). With row-based replication this is no longer possible, as database name is encoded within the the BINLOG '....' statement. This task is about adding an option to mysqlbinlog that would allow to change the names of used databases in both RBR and SBR events. (*) this implies that all statements refer to tables in the current database, doesn't catch updates made inside stored functions and so forth, but still works for a practially-important subset of cases. HIGH-LEVEL SPECIFICATION: Context ------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has a replication slave option --replicate-rewrite-db="from->to" the option affects - Table_map_log_event (all RBR events) - Load_log_event (LOAD DATA) - Query_log_event (SBR-based updates, with the usual assumption that the statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). See also MySQL BUG#42941. Note this bug is fixed in MySQL 5.1.37, which is not merged into MariaDB at the time of writing, but planned to be merged before release. What we could do ---------------- Option1: make mysqlbinlog accept --replicate-rewrite-db option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Make mysqlbinlog accept --replicate-rewrite-db options and process them to the same extent as replication slave would process --replicate-rewrite-db option. Option2: Add database-agnostic RBR events and --strip-db option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Right now RBR events require a databasename. It is not possible to have RBR event stream that won't mention which database the events are for. When I tried to use debugger and specify empty database name, attempt to apply the binlog resulted in this error: 090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on opening tables, We could do as follows: - Make the server interpret empty database name in RBR event (i.e. in a Table_map_log_event) as "use current database". Binlog slave thread probably should not allow such events as it doesn't have a natural current database. - Add a mysqlbinlog --strip-db option that would = not produce any "USE dbname" statements = change databasename for all RBR events to be empty That way, mysqlbinlog output will be database-agnostic and apply to the current database. (this will have the usual limitations that we assume that all statements in the binlog refer to the current database). Option3: Enhance database rewrite ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If there is a need to support database change for statements that use dbname.tablename notation and are replicated as statements (i.e. are DDL statements and/or DML statements that are binlogged as statements), then that could be supported as follows: - Make the server's parser recognize special form of comments /* !database-alias(oldname,newname) */ and save the mapping somewhere - Put the hooks in table open and name resolution code to use the saved mapping. Once we've done the above, it will be easy to perform a complete, no-compromise or restrictions database name change in binary log. It will be possible to do the rewrites either on the slave ( --replicate-rewrite-db will work for all kinds of statements), or in mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to parse the statement). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to change the used database CREATION DATE..: Fri, 07 Aug 2009, 14:57 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 36 (http://askmonty.org/worklog/?tid=36) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 12:44)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.7834 2009-08-17 12:44:17.000000000 +0300 +++ /tmp/wklog.36.new.7834 2009-08-17 12:44:17.000000000 +0300 @@ -13,7 +13,9 @@ statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). -See also MySQL BUG#42941. +See also MySQL BUG#42941. Note this bug is fixed in MySQL 5.1.37, which is not +merged into MariaDB at the time of writing, but planned to be merged before +release. What we could do ---------------- -=-=(Guest - Sun, 16 Aug 2009, 17:11)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.27162 2009-08-16 17:11:12.000000000 +0300 +++ /tmp/wklog.36.new.27162 2009-08-16 17:11:12.000000000 +0300 @@ -13,6 +13,8 @@ statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). +See also MySQL BUG#42941. + What we could do ---------------- -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.13035 2009-08-10 15:41:51.000000000 +0300 +++ /tmp/wklog.36.new.13035 2009-08-10 15:41:51.000000000 +0300 @@ -1,5 +1,7 @@ Context ------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has a replication slave option --replicate-rewrite-db="from->to" -=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300 +++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300 @@ -1,4 +1,3 @@ - Context ------- At the moment, the server has a replication slave option @@ -67,6 +66,6 @@ It will be possible to do the rewrites either on the slave ( --replicate-rewrite-db will work for all kinds of statements), or in -mysqlbinlog (adding a comment is easy and doesn't require use to parse the -statement). +mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to +parse the statement). -=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300 +++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300 @@ -1 +1,72 @@ +Context +------- +At the moment, the server has a replication slave option + + --replicate-rewrite-db="from->to" + +the option affects +- Table_map_log_event (all RBR events) +- Load_log_event (LOAD DATA) +- Query_log_event (SBR-based updates, with the usual assumption that the + statement refers to tables in current database, so that changing the current + database will make the statement to work on a table in a different database). + +What we could do +---------------- + +Option1: make mysqlbinlog accept --replicate-rewrite-db option +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Make mysqlbinlog accept --replicate-rewrite-db options and process them to the +same extent as replication slave would process --replicate-rewrite-db option. + + +Option2: Add database-agnostic RBR events and --strip-db option +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Right now RBR events require a databasename. It is not possible to have RBR +event stream that won't mention which database the events are for. When I +tried to use debugger and specify empty database name, attempt to apply the +binlog resulted in this error: + +090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on +opening tables, + +We could do as follows: +- Make the server interpret empty database name in RBR event (i.e. in a + Table_map_log_event) as "use current database". Binlog slave thread + probably should not allow such events as it doesn't have a natural current + database. +- Add a mysqlbinlog --strip-db option that would + = not produce any "USE dbname" statements + = change databasename for all RBR events to be empty + +That way, mysqlbinlog output will be database-agnostic and apply to the +current database. +(this will have the usual limitations that we assume that all statements in +the binlog refer to the current database). + +Option3: Enhance database rewrite +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If there is a need to support database change for statements that use +dbname.tablename notation and are replicated as statements (i.e. are DDL +statements and/or DML statements that are binlogged as statements), +then that could be supported as follows: + +- Make the server's parser recognize special form of comments + + /* !database-alias(oldname,newname) */ + + and save the mapping somewhere + +- Put the hooks in table open and name resolution code to use the saved + mapping. + + +Once we've done the above, it will be easy to perform a complete, +no-compromise or restrictions database name change in binary log. + +It will be possible to do the rewrites either on the slave ( +--replicate-rewrite-db will work for all kinds of statements), or in +mysqlbinlog (adding a comment is easy and doesn't require use to parse the +statement). + -=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=- Dependency created: 39 now depends on 36 -=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=- Title modified. --- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300 +++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300 @@ -1 +1 @@ -Add a mysqlbinlog option to change the database +Add a mysqlbinlog option to change the used database DESCRIPTION: Sometimes there is a need to take a binary log and apply it to a database with a different name than the original name of the database on binlog producer. If one is using statement-based replication, he can achieve this by grepping out "USE dbname" statements out of the output of mysqlbinlog(*). With row-based replication this is no longer possible, as database name is encoded within the the BINLOG '....' statement. This task is about adding an option to mysqlbinlog that would allow to change the names of used databases in both RBR and SBR events. (*) this implies that all statements refer to tables in the current database, doesn't catch updates made inside stored functions and so forth, but still works for a practially-important subset of cases. HIGH-LEVEL SPECIFICATION: Context ------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has a replication slave option --replicate-rewrite-db="from->to" the option affects - Table_map_log_event (all RBR events) - Load_log_event (LOAD DATA) - Query_log_event (SBR-based updates, with the usual assumption that the statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). See also MySQL BUG#42941. Note this bug is fixed in MySQL 5.1.37, which is not merged into MariaDB at the time of writing, but planned to be merged before release. What we could do ---------------- Option1: make mysqlbinlog accept --replicate-rewrite-db option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Make mysqlbinlog accept --replicate-rewrite-db options and process them to the same extent as replication slave would process --replicate-rewrite-db option. Option2: Add database-agnostic RBR events and --strip-db option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Right now RBR events require a databasename. It is not possible to have RBR event stream that won't mention which database the events are for. When I tried to use debugger and specify empty database name, attempt to apply the binlog resulted in this error: 090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on opening tables, We could do as follows: - Make the server interpret empty database name in RBR event (i.e. in a Table_map_log_event) as "use current database". Binlog slave thread probably should not allow such events as it doesn't have a natural current database. - Add a mysqlbinlog --strip-db option that would = not produce any "USE dbname" statements = change databasename for all RBR events to be empty That way, mysqlbinlog output will be database-agnostic and apply to the current database. (this will have the usual limitations that we assume that all statements in the binlog refer to the current database). Option3: Enhance database rewrite ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If there is a need to support database change for statements that use dbname.tablename notation and are replicated as statements (i.e. are DDL statements and/or DML statements that are binlogged as statements), then that could be supported as follows: - Make the server's parser recognize special form of comments /* !database-alias(oldname,newname) */ and save the mapping somewhere - Put the hooks in table open and name resolution code to use the saved mapping. Once we've done the above, it will be easy to perform a complete, no-compromise or restrictions database name change in binary log. It will be possible to do the rewrites either on the slave ( --replicate-rewrite-db will work for all kinds of statements), or in mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to parse the statement). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 11:38)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.4940 2009-08-17 11:38:49.000000000 +0300 +++ /tmp/wklog.46.new.4940 2009-08-17 11:38:49.000000000 +0300 @@ -16,8 +16,8 @@ Feedback and other suggestions ------------------------------ -* What is the need for WITH TIMESTAMP part? Can't one use a separate - SET TIMESTAMP statement? +* TIMESTAMP part is better omitted and replaced with separate SET TIMESTAMP +statement for consistency with other events. * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? @@ -25,6 +25,7 @@ - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). + One problem is that we do not have column names available in the binary log. * When SBR replication is used and the statements refer to the current database (a common scenario), one can use awk to filter out updates made in certain -=-=(Psergey - Sun, 16 Aug 2009, 11:30)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.13453 2009-08-16 11:30:06.000000000 +0300 +++ /tmp/wklog.46.new.13453 2009-08-16 11:30:06.000000000 +0300 @@ -26,3 +26,7 @@ (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). +* When SBR replication is used and the statements refer to the current database + (a common scenario), one can use awk to filter out updates made in certain + databases. The proposed syntax doesn't allow to perform equivalent filtering? + -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12747 2009-08-16 11:13:54.000000000 +0300 +++ /tmp/wklog.46.new.12747 2009-08-16 11:13:54.000000000 +0300 @@ -6,4 +6,4 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default -The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12717 2009-08-16 11:13:40.000000000 +0300 +++ /tmp/wklog.46.new.12717 2009-08-16 11:13:40.000000000 +0300 @@ -5,3 +5,5 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default + +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency deleted: 48 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency created: 48 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * TIMESTAMP part is better omitted and replaced with separate SET TIMESTAMP statement for consistency with other events. * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). One problem is that we do not have column names available in the binary log. * When SBR replication is used and the statements refer to the current database (a common scenario), one can use awk to filter out updates made in certain databases. The proposed syntax doesn't allow to perform equivalent filtering? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 11:38)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.4940 2009-08-17 11:38:49.000000000 +0300 +++ /tmp/wklog.46.new.4940 2009-08-17 11:38:49.000000000 +0300 @@ -16,8 +16,8 @@ Feedback and other suggestions ------------------------------ -* What is the need for WITH TIMESTAMP part? Can't one use a separate - SET TIMESTAMP statement? +* TIMESTAMP part is better omitted and replaced with separate SET TIMESTAMP +statement for consistency with other events. * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? @@ -25,6 +25,7 @@ - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). + One problem is that we do not have column names available in the binary log. * When SBR replication is used and the statements refer to the current database (a common scenario), one can use awk to filter out updates made in certain -=-=(Psergey - Sun, 16 Aug 2009, 11:30)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.13453 2009-08-16 11:30:06.000000000 +0300 +++ /tmp/wklog.46.new.13453 2009-08-16 11:30:06.000000000 +0300 @@ -26,3 +26,7 @@ (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). +* When SBR replication is used and the statements refer to the current database + (a common scenario), one can use awk to filter out updates made in certain + databases. The proposed syntax doesn't allow to perform equivalent filtering? + -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12747 2009-08-16 11:13:54.000000000 +0300 +++ /tmp/wklog.46.new.12747 2009-08-16 11:13:54.000000000 +0300 @@ -6,4 +6,4 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default -The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12717 2009-08-16 11:13:40.000000000 +0300 +++ /tmp/wklog.46.new.12717 2009-08-16 11:13:40.000000000 +0300 @@ -5,3 +5,5 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default + +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency deleted: 48 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency created: 48 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * TIMESTAMP part is better omitted and replaced with separate SET TIMESTAMP statement for consistency with other events. * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). One problem is that we do not have column names available in the binary log. * When SBR replication is used and the statements refer to the current database (a common scenario), one can use awk to filter out updates made in certain databases. The proposed syntax doesn't allow to perform equivalent filtering? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Knielsen): improving mysqlbinlog output and doing rename (39)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: improving mysqlbinlog output and doing rename CREATION DATE..: Sun, 09 Aug 2009, 12:24 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-RawIdeaBin TASK ID........: 39 (http://askmonty.org/worklog/?tid=39) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 29 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=- Research and architecture review. Worked 12 hours and estimate 0 hours remain (original estimate increased by 12 hours). -=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=- Reported zero hours worked. Estimate unchanged. -=-=(Psergey - Sun, 16 Aug 2009, 12:07)=-=- Dependency created: 39 now depends on 49 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 47 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 45 -=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=- Title modified. --- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300 +++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300 @@ -1 +1 @@ -Replication tasks +improving mysqlbinlog output and doing rename -=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=- Adding 1 hour for Monty's initial work on starting the architecture review. Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour). ------------------------------------------------------------ -=-=(View All Progress Notes, 16 total)=-=- http://askmonty.org/worklog/index.pl?tid=39&nolimit=1 DESCRIPTION: A combine task for all replication tasks. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Knielsen): improving mysqlbinlog output and doing rename (39)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: improving mysqlbinlog output and doing rename CREATION DATE..: Sun, 09 Aug 2009, 12:24 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-RawIdeaBin TASK ID........: 39 (http://askmonty.org/worklog/?tid=39) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 29 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=- Research and architecture review. Worked 12 hours and estimate 0 hours remain (original estimate increased by 12 hours). -=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=- Reported zero hours worked. Estimate unchanged. -=-=(Psergey - Sun, 16 Aug 2009, 12:07)=-=- Dependency created: 39 now depends on 49 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 47 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 45 -=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=- Title modified. --- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300 +++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300 @@ -1 +1 @@ -Replication tasks +improving mysqlbinlog output and doing rename -=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=- Adding 1 hour for Monty's initial work on starting the architecture review. Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour). ------------------------------------------------------------ -=-=(View All Progress Notes, 16 total)=-=- http://askmonty.org/worklog/index.pl?tid=39&nolimit=1 DESCRIPTION: A combine task for all replication tasks. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Knielsen): improving mysqlbinlog output and doing rename (39)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: improving mysqlbinlog output and doing rename CREATION DATE..: Sun, 09 Aug 2009, 12:24 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-RawIdeaBin TASK ID........: 39 (http://askmonty.org/worklog/?tid=39) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 17 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=- Reported zero hours worked. Estimate unchanged. -=-=(Psergey - Sun, 16 Aug 2009, 12:07)=-=- Dependency created: 39 now depends on 49 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 47 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 45 -=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=- Title modified. --- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300 +++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300 @@ -1 +1 @@ -Replication tasks +improving mysqlbinlog output and doing rename -=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=- Adding 1 hour for Monty's initial work on starting the architecture review. Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour). -=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=- Re-searched and added subtasks. Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours). ------------------------------------------------------------ -=-=(View All Progress Notes, 15 total)=-=- http://askmonty.org/worklog/index.pl?tid=39&nolimit=1 DESCRIPTION: A combine task for all replication tasks. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Knielsen): improving mysqlbinlog output and doing rename (39)
by worklog-noreply＠askmonty.org 17 Aug '09

17 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: improving mysqlbinlog output and doing rename CREATION DATE..: Sun, 09 Aug 2009, 12:24 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-RawIdeaBin TASK ID........: 39 (http://askmonty.org/worklog/?tid=39) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 17 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Mon, 17 Aug 2009, 11:13)=-=- Reported zero hours worked. Estimate unchanged. -=-=(Psergey - Sun, 16 Aug 2009, 12:07)=-=- Dependency created: 39 now depends on 49 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 47 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 45 -=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=- Title modified. --- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300 +++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300 @@ -1 +1 @@ -Replication tasks +improving mysqlbinlog output and doing rename -=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=- Adding 1 hour for Monty's initial work on starting the architecture review. Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour). -=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=- Re-searched and added subtasks. Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours). ------------------------------------------------------------ -=-=(View All Progress Notes, 15 total)=-=- http://askmonty.org/worklog/index.pl?tid=39&nolimit=1 DESCRIPTION: A combine task for all replication tasks. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Rev 2730: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 16 Aug '09

16 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2730 revision-id: psergey(a)askmonty.org-20090816180159-z3lfkjpjfsm7zbp0 parent: psergey(a)askmonty.org-20090816143547-16hyle50tbt31xen committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sun 2009-08-16 21:01:59 +0300 message: MWL#17: Table elimination - More comments === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-16 14:35:47 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-16 18:01:59 +0000 @@ -58,7 +58,7 @@ Table elimination is redone on every PS re-execution. - IMPLEMENTATION + TABLE ELIMINATION ALGORITHM As said above, we can remove inner side of an outer join if it is @@ -67,23 +67,59 @@ We check #1 by doing a recursive descent down the join->join_list while maintaining a union of used_tables() attribute of all expressions we've seen - "elsewhere". When we encounter an outer join, we check if the bitmap of - tables on its inner side has intersection with tables that are used - elsewhere. No intersection means that inner side of the outer join could + "elsewhere". When we encounter an outer join, we check if the bitmap of + tables on its inner side has an intersection with tables that are used + elsewhere. No intersection means that inner side of the outer join could potentially be eliminated. - #2 is checked using a concept of values and modules that indicate - dependencies between them. - - We start with - of certain values that functional dependencies between - them. There are two kinds of values: -*/ - -/* - A value - - functional dependencies between two kinds of entities: + In order to check #2, one needs to prove that inner side of an outer join + is functionally dependent on the outside. We prove dependency by proving + functional dependency of intermediate objects: + + - Inner side of outer join is functionally dependent when each of its tables + are functionally dependent. (We assume a table is functionally dependent + when its dependencies allow to uniquely identify one table record, or no + records). + + - Table is functionally dependent when it has got a unique key whose columns + are functionally dependent. + + - A column is functionally dependent when we could locate an AND-part of a + certain ON clause in form + + tblX.columnY= expr + + where expr is functionally-depdendent. + + Apparently the above rules can be applied recursively. Also, certain entities + depend on multiple other entities. We model this by a bipartite graph which + has two kinds of nodes: + + Value nodes: + - Table column values (each is a value of tblX.columnY) + - Table nodes (each node represents a table inside an eliminable join nest). + each value is either bound (i.e. functionally dependent) or not. + + Module nodes: + - Nodes representing tblX.colY=expr equalities. Equality node has + = incoming edges from columns used in expr + = outgoing edge to tblX.colY column. + - Nodes representing unique keys. Unique key has + = incoming edges from key component value nodes + = outgoing edge to key's table node + - Inner side of outer join node. Outer join node has + = incoming edges from table value nodes + = No outgoing edges. Once we reach it, we know we can eliminate the + outer join. + A module may depend on multiple values, and hence its primary attribute is + the number of its depedencies that are not bound. + + The algorithm starts with equality nodes that don't have any incoming edges + (their expressions are either constant or depend only on tables that are + outside of any outer joins) and proceeds to traverse dependency->dependant + edges until we've other traversed everything (TODO rephrase elaborate), or + we've reached the point where all outer join modules have zero unsatisfied + dependencies. */ class Value_dep;

1 0

Re: [Maria-developers] Sphinx Storage engine for MariaDB
by Andrew Aksyonoff 16 Aug '09

16 Aug '09

Hello Michael, Looks like I've overlooked this email back then. :( Peter pinged me about Sphinx vs Maria status recently and I just found it. Well, hopefully better late than never! Sunday, June 7, 2009, 1:18:30 PM, you wrote: MW> Andrew, what are the possible drawbacks you can see with having MW> Sphinx to be a part of MairaDB for a user that is not using Sphinx? Can't think of any. SphinxSE is a mere client and as such does not allocate any big RAM buffers or other resources. MW> I assume that if Sphinx is not enabled, it will not take any MW> resources. MW> If Sphinx is enabled but not used, what are the resorces it would use? Pretty much none, AFAIK. --- On an unrelated note, we're working on so called RT backend here, and it will allow all the normal CRUD operations in run time (as opposed to only having reads against a static fulltext index that we have now). When it's done it'll be also technically possible to integrate it too - and do so tighter by embedding the library instead of just talking to Sphinx searchd over network. Sphinx searchd can now talk MySQL protocol and supports basic SQL syntax. So for "just" full text tasks end users don't really need the integrated version. However it still might possibly be useful in certain use cases. To keep FT index in (better) sync with DB data, avoid overheads of double network roundtrips for additional processing, avoid hassles of keeping two connections and manually managing two open transactions, etc. So I wonder what'd be your opinion about the integration - whether it seems useful at all, and if yes, whether network client or embedded library route seems better. -- Best regards, Andrew mailto:shodan@shodan.ru

5 14

[Maria-developers] Rev 2729: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 16 Aug '09

16 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2729 revision-id: psergey(a)askmonty.org-20090816143547-16hyle50tbt31xen parent: psergey(a)askmonty.org-20090816124331-gd53m2alc0jb3ws4 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sun 2009-08-16 17:35:47 +0300 message: MWL#17: Table elimination - Better comments - More OOM checks === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-16 12:43:31 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-16 14:35:47 +0000 @@ -19,6 +19,25 @@ /* OVERVIEW + This file contains table elimination module. The idea behind table + elimination is as follows: suppose we have a left join + + SELECT * FROM t1 LEFT JOIN + (t2 JOIN t3) ON t3.primary_key=t1.col AND + t4.primary_key=t2.col + such that + * columns of the inner tables are not used anywhere ouside the outer join + (not in WHERE, not in GROUP/ORDER BY clause, not in select list etc etc), + * inner side of the outer join is guaranteed to produce at most one matching + record combination for each record combination of outer tables. + + then the inner side of the outer join can be removed from the query, as it + will always produce only one record combination (either real or + null-complemented one) and we don't care about what that record combination + is. + + MODULE INTERFACE + The module has one entry point - eliminate_tables() function, which one needs to call (once) at some point before the join optimization. eliminate_tables() operates over the JOIN structures. Logically, it @@ -38,6 +57,50 @@ by EXPLAIN code to check if the subquery should be shown in EXPLAIN. Table elimination is redone on every PS re-execution. + + IMPLEMENTATION + + As said above, we can remove inner side of an outer join if it is + + 1. not referred to from any other parts of the query + 2. always produces one matching record combination. + + We check #1 by doing a recursive descent down the join->join_list while + maintaining a union of used_tables() attribute of all expressions we've seen + "elsewhere". When we encounter an outer join, we check if the bitmap of + tables on its inner side has intersection with tables that are used + elsewhere. No intersection means that inner side of the outer join could + potentially be eliminated. + + #2 is checked using a concept of values and modules that indicate + dependencies between them. + + We start with + of certain values that functional dependencies between + them. There are two kinds of values: +*/ + +/* + A value + + functional dependencies between two kinds of entities: +*/ + +class Value_dep; + class Field_value; + class Table_value; + + +class Module_dep; + class Equality_module; + class Outer_join_module; + class Key_module; + +class Table_elimination; + + +/* + A value. */ class Value_dep : public Sql_alloc @@ -55,13 +118,9 @@ Value_dep *next; }; -class Field_value; -class Table_value; -class Outer_join_module; -class Key_module; /* - A table field. There is only one such object for any tblX.fieldY + A table field value. There is exactly only one such object for any tblX.fieldY - the field epends on its table and equalities - expressions that use the field are its dependencies */ @@ -87,7 +146,8 @@ /* - A table. + A table value. There is one Table_value object for every table that can + potentially be eliminated. - table depends on any of its unique keys - has its fields and embedding outer join as dependency. */ @@ -221,6 +281,7 @@ MY_BITMAP expr_deps; }; + static bool build_eq_deps_for_cond(Table_elimination *te, Equality_module **fdeps, uint *and_level, Item *cond, @@ -244,6 +305,7 @@ #ifndef DBUG_OFF static void dbug_print_deps(Table_elimination *te); #endif + /*******************************************************************************************/ /* @@ -538,8 +600,8 @@ static bool add_eq_dep(Table_elimination *te, Equality_module **eq_dep, - uint and_level, Item_func *cond, - Item *left, Item *right, table_map usable_tables) + uint and_level, Item_func *cond, Item *left, Item *right, + table_map usable_tables) { if ((left->used_tables() & usable_tables) && !(right->used_tables() & RAND_TABLE_BIT) && @@ -565,7 +627,6 @@ } } - /* Store possible eq field */ (*eq_dep)->type= Module_dep::MODULE_EXPRESSION; //psergey-todo; if (!((*eq_dep)->field= get_field_value(te, field))) return TRUE; @@ -651,7 +712,8 @@ table_map deps_map) { Outer_join_module *oj_dep; - oj_dep= new Outer_join_module(outer_join, my_count_bits(deps_map)); + if (!(oj_dep= new Outer_join_module(outer_join, my_count_bits(deps_map)))) + return NULL; te->n_outer_joins++; /* === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2009-08-13 21:12:12 +0000 +++ b/sql/sql_select.cc 2009-08-16 14:35:47 +0000 @@ -8967,20 +8967,6 @@ JOIN *join= last->join; while (last_emb) { - /* - psergey-elim: (nevermind) - new_prefix= cur_prefix & ~last; - if (!(new_prefix & cur_table_map)) // removed last inner table - { - join->cur_embedding_map&= ~last_emb->nested_join->nj_map; - } - else (current) - { - // Won't hurt doing it all the time: - join->cur_embedding_map |= ...; - } - else - */ if (!(--last_emb->nested_join->counter)) join->cur_embedding_map&= ~last_emb->nested_join->nj_map; else if (last_emb->nested_join->n_tables-1 ==

1 0

[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to change the used database CREATION DATE..: Fri, 07 Aug 2009, 14:57 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 36 (http://askmonty.org/worklog/?tid=36) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Sun, 16 Aug 2009, 17:11)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.27162 2009-08-16 17:11:12.000000000 +0300 +++ /tmp/wklog.36.new.27162 2009-08-16 17:11:12.000000000 +0300 @@ -13,6 +13,8 @@ statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). +See also MySQL BUG#42941. + What we could do ---------------- -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.13035 2009-08-10 15:41:51.000000000 +0300 +++ /tmp/wklog.36.new.13035 2009-08-10 15:41:51.000000000 +0300 @@ -1,5 +1,7 @@ Context ------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has a replication slave option --replicate-rewrite-db="from->to" -=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300 +++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300 @@ -1,4 +1,3 @@ - Context ------- At the moment, the server has a replication slave option @@ -67,6 +66,6 @@ It will be possible to do the rewrites either on the slave ( --replicate-rewrite-db will work for all kinds of statements), or in -mysqlbinlog (adding a comment is easy and doesn't require use to parse the -statement). +mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to +parse the statement). -=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300 +++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300 @@ -1 +1,72 @@ +Context +------- +At the moment, the server has a replication slave option + + --replicate-rewrite-db="from->to" + +the option affects +- Table_map_log_event (all RBR events) +- Load_log_event (LOAD DATA) +- Query_log_event (SBR-based updates, with the usual assumption that the + statement refers to tables in current database, so that changing the current + database will make the statement to work on a table in a different database). + +What we could do +---------------- + +Option1: make mysqlbinlog accept --replicate-rewrite-db option +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Make mysqlbinlog accept --replicate-rewrite-db options and process them to the +same extent as replication slave would process --replicate-rewrite-db option. + + +Option2: Add database-agnostic RBR events and --strip-db option +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Right now RBR events require a databasename. It is not possible to have RBR +event stream that won't mention which database the events are for. When I +tried to use debugger and specify empty database name, attempt to apply the +binlog resulted in this error: + +090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on +opening tables, + +We could do as follows: +- Make the server interpret empty database name in RBR event (i.e. in a + Table_map_log_event) as "use current database". Binlog slave thread + probably should not allow such events as it doesn't have a natural current + database. +- Add a mysqlbinlog --strip-db option that would + = not produce any "USE dbname" statements + = change databasename for all RBR events to be empty + +That way, mysqlbinlog output will be database-agnostic and apply to the +current database. +(this will have the usual limitations that we assume that all statements in +the binlog refer to the current database). + +Option3: Enhance database rewrite +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If there is a need to support database change for statements that use +dbname.tablename notation and are replicated as statements (i.e. are DDL +statements and/or DML statements that are binlogged as statements), +then that could be supported as follows: + +- Make the server's parser recognize special form of comments + + /* !database-alias(oldname,newname) */ + + and save the mapping somewhere + +- Put the hooks in table open and name resolution code to use the saved + mapping. + + +Once we've done the above, it will be easy to perform a complete, +no-compromise or restrictions database name change in binary log. + +It will be possible to do the rewrites either on the slave ( +--replicate-rewrite-db will work for all kinds of statements), or in +mysqlbinlog (adding a comment is easy and doesn't require use to parse the +statement). + -=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=- Dependency created: 39 now depends on 36 -=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=- Title modified. --- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300 +++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300 @@ -1 +1 @@ -Add a mysqlbinlog option to change the database +Add a mysqlbinlog option to change the used database DESCRIPTION: Sometimes there is a need to take a binary log and apply it to a database with a different name than the original name of the database on binlog producer. If one is using statement-based replication, he can achieve this by grepping out "USE dbname" statements out of the output of mysqlbinlog(*). With row-based replication this is no longer possible, as database name is encoded within the the BINLOG '....' statement. This task is about adding an option to mysqlbinlog that would allow to change the names of used databases in both RBR and SBR events. (*) this implies that all statements refer to tables in the current database, doesn't catch updates made inside stored functions and so forth, but still works for a practially-important subset of cases. HIGH-LEVEL SPECIFICATION: Context ------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has a replication slave option --replicate-rewrite-db="from->to" the option affects - Table_map_log_event (all RBR events) - Load_log_event (LOAD DATA) - Query_log_event (SBR-based updates, with the usual assumption that the statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). See also MySQL BUG#42941. What we could do ---------------- Option1: make mysqlbinlog accept --replicate-rewrite-db option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Make mysqlbinlog accept --replicate-rewrite-db options and process them to the same extent as replication slave would process --replicate-rewrite-db option. Option2: Add database-agnostic RBR events and --strip-db option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Right now RBR events require a databasename. It is not possible to have RBR event stream that won't mention which database the events are for. When I tried to use debugger and specify empty database name, attempt to apply the binlog resulted in this error: 090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on opening tables, We could do as follows: - Make the server interpret empty database name in RBR event (i.e. in a Table_map_log_event) as "use current database". Binlog slave thread probably should not allow such events as it doesn't have a natural current database. - Add a mysqlbinlog --strip-db option that would = not produce any "USE dbname" statements = change databasename for all RBR events to be empty That way, mysqlbinlog output will be database-agnostic and apply to the current database. (this will have the usual limitations that we assume that all statements in the binlog refer to the current database). Option3: Enhance database rewrite ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If there is a need to support database change for statements that use dbname.tablename notation and are replicated as statements (i.e. are DDL statements and/or DML statements that are binlogged as statements), then that could be supported as follows: - Make the server's parser recognize special form of comments /* !database-alias(oldname,newname) */ and save the mapping somewhere - Put the hooks in table open and name resolution code to use the saved mapping. Once we've done the above, it will be easy to perform a complete, no-compromise or restrictions database name change in binary log. It will be possible to do the rewrites either on the slave ( --replicate-rewrite-db will work for all kinds of statements), or in mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to parse the statement). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to change the used database CREATION DATE..: Fri, 07 Aug 2009, 14:57 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 36 (http://askmonty.org/worklog/?tid=36) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Sun, 16 Aug 2009, 17:11)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.27162 2009-08-16 17:11:12.000000000 +0300 +++ /tmp/wklog.36.new.27162 2009-08-16 17:11:12.000000000 +0300 @@ -13,6 +13,8 @@ statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). +See also MySQL BUG#42941. + What we could do ---------------- -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.13035 2009-08-10 15:41:51.000000000 +0300 +++ /tmp/wklog.36.new.13035 2009-08-10 15:41:51.000000000 +0300 @@ -1,5 +1,7 @@ Context ------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has a replication slave option --replicate-rewrite-db="from->to" -=-=(Guest - Mon, 10 Aug 2009, 11:12)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.6580 2009-08-10 11:12:36.000000000 +0300 +++ /tmp/wklog.36.new.6580 2009-08-10 11:12:36.000000000 +0300 @@ -1,4 +1,3 @@ - Context ------- At the moment, the server has a replication slave option @@ -67,6 +66,6 @@ It will be possible to do the rewrites either on the slave ( --replicate-rewrite-db will work for all kinds of statements), or in -mysqlbinlog (adding a comment is easy and doesn't require use to parse the -statement). +mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to +parse the statement). -=-=(Psergey - Sun, 09 Aug 2009, 23:53)=-=- High-Level Specification modified. --- /tmp/wklog.36.old.13425 2009-08-09 23:53:54.000000000 +0300 +++ /tmp/wklog.36.new.13425 2009-08-09 23:53:54.000000000 +0300 @@ -1 +1,72 @@ +Context +------- +At the moment, the server has a replication slave option + + --replicate-rewrite-db="from->to" + +the option affects +- Table_map_log_event (all RBR events) +- Load_log_event (LOAD DATA) +- Query_log_event (SBR-based updates, with the usual assumption that the + statement refers to tables in current database, so that changing the current + database will make the statement to work on a table in a different database). + +What we could do +---------------- + +Option1: make mysqlbinlog accept --replicate-rewrite-db option +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Make mysqlbinlog accept --replicate-rewrite-db options and process them to the +same extent as replication slave would process --replicate-rewrite-db option. + + +Option2: Add database-agnostic RBR events and --strip-db option +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Right now RBR events require a databasename. It is not possible to have RBR +event stream that won't mention which database the events are for. When I +tried to use debugger and specify empty database name, attempt to apply the +binlog resulted in this error: + +090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on +opening tables, + +We could do as follows: +- Make the server interpret empty database name in RBR event (i.e. in a + Table_map_log_event) as "use current database". Binlog slave thread + probably should not allow such events as it doesn't have a natural current + database. +- Add a mysqlbinlog --strip-db option that would + = not produce any "USE dbname" statements + = change databasename for all RBR events to be empty + +That way, mysqlbinlog output will be database-agnostic and apply to the +current database. +(this will have the usual limitations that we assume that all statements in +the binlog refer to the current database). + +Option3: Enhance database rewrite +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If there is a need to support database change for statements that use +dbname.tablename notation and are replicated as statements (i.e. are DDL +statements and/or DML statements that are binlogged as statements), +then that could be supported as follows: + +- Make the server's parser recognize special form of comments + + /* !database-alias(oldname,newname) */ + + and save the mapping somewhere + +- Put the hooks in table open and name resolution code to use the saved + mapping. + + +Once we've done the above, it will be easy to perform a complete, +no-compromise or restrictions database name change in binary log. + +It will be possible to do the rewrites either on the slave ( +--replicate-rewrite-db will work for all kinds of statements), or in +mysqlbinlog (adding a comment is easy and doesn't require use to parse the +statement). + -=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=- Dependency created: 39 now depends on 36 -=-=(Psergey - Fri, 07 Aug 2009, 14:57)=-=- Title modified. --- /tmp/wklog.36.old.14687 2009-08-07 14:57:49.000000000 +0300 +++ /tmp/wklog.36.new.14687 2009-08-07 14:57:49.000000000 +0300 @@ -1 +1 @@ -Add a mysqlbinlog option to change the database +Add a mysqlbinlog option to change the used database DESCRIPTION: Sometimes there is a need to take a binary log and apply it to a database with a different name than the original name of the database on binlog producer. If one is using statement-based replication, he can achieve this by grepping out "USE dbname" statements out of the output of mysqlbinlog(*). With row-based replication this is no longer possible, as database name is encoded within the the BINLOG '....' statement. This task is about adding an option to mysqlbinlog that would allow to change the names of used databases in both RBR and SBR events. (*) this implies that all statements refer to tables in the current database, doesn't catch updates made inside stored functions and so forth, but still works for a practially-important subset of cases. HIGH-LEVEL SPECIFICATION: Context ------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has a replication slave option --replicate-rewrite-db="from->to" the option affects - Table_map_log_event (all RBR events) - Load_log_event (LOAD DATA) - Query_log_event (SBR-based updates, with the usual assumption that the statement refers to tables in current database, so that changing the current database will make the statement to work on a table in a different database). See also MySQL BUG#42941. What we could do ---------------- Option1: make mysqlbinlog accept --replicate-rewrite-db option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Make mysqlbinlog accept --replicate-rewrite-db options and process them to the same extent as replication slave would process --replicate-rewrite-db option. Option2: Add database-agnostic RBR events and --strip-db option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Right now RBR events require a databasename. It is not possible to have RBR event stream that won't mention which database the events are for. When I tried to use debugger and specify empty database name, attempt to apply the binlog resulted in this error: 090809 17:38:44 [ERROR] Slave SQL: Error 'Table '.tablename' doesn't exist' on opening tables, We could do as follows: - Make the server interpret empty database name in RBR event (i.e. in a Table_map_log_event) as "use current database". Binlog slave thread probably should not allow such events as it doesn't have a natural current database. - Add a mysqlbinlog --strip-db option that would = not produce any "USE dbname" statements = change databasename for all RBR events to be empty That way, mysqlbinlog output will be database-agnostic and apply to the current database. (this will have the usual limitations that we assume that all statements in the binlog refer to the current database). Option3: Enhance database rewrite ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If there is a need to support database change for statements that use dbname.tablename notation and are replicated as statements (i.e. are DDL statements and/or DML statements that are binlogged as statements), then that could be supported as follows: - Make the server's parser recognize special form of comments /* !database-alias(oldname,newname) */ and save the mapping somewhere - Put the hooks in table open and name resolution code to use the saved mapping. Once we've done the above, it will be easy to perform a complete, no-compromise or restrictions database name change in binary log. It will be possible to do the rewrites either on the slave ( --replicate-rewrite-db will work for all kinds of statements), or in mysqlbinlog (adding a comment is easy and doesn't require mysqlbinlog to parse the statement). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-Sprint TASK ID........: 17 (http://askmonty.org/worklog/?tid=17) VERSION........: 9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 1 ESTIMATE.......: 3 (hours remain) ORIG. ESTIMATE.: 3 PROGRESS NOTES: -=-=(Guest - Sun, 16 Aug 2009, 16:16)=-=- Category updated. --- /tmp/wklog.17.old.24882 2009-08-16 16:16:49.000000000 +0300 +++ /tmp/wklog.17.new.24882 2009-08-16 16:16:49.000000000 +0300 @@ -1 +1 @@ -Client-BackLog +Server-Sprint -=-=(Guest - Sun, 16 Aug 2009, 16:16)=-=- Version updated. --- /tmp/wklog.17.old.24882 2009-08-16 16:16:49.000000000 +0300 +++ /tmp/wklog.17.new.24882 2009-08-16 16:16:49.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Wed, 29 Jul 2009, 21:41)=-=- Low Level Design modified. --- /tmp/wklog.17.old.26011 2009-07-29 21:41:04.000000000 +0300 +++ /tmp/wklog.17.new.26011 2009-07-29 21:41:04.000000000 +0300 @@ -2,163 +2,146 @@ ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> -1. Conditions for removal -1.1 Quick check if there are candidates -2. Removal operation properties -3. Removal operation -4. User interface -5. Tests and benchmarks -6. Todo, issues to resolve -6.1 To resolve -6.2 Resolved -7. Additional issues +1. Elimination criteria +2. No outside references check +2.1 Quick check if there are tables with no outside references +3. One-match check +3.1 Functional dependency source #1: Potential eq_ref access +3.2 Functional dependency source #2: col2=func(col1) +3.3 Functional dependency source #3: One or zero records in the table +3.4 Functional dependency check implementation +3.4.1 Equality collection: Option1 +3.4.2 Equality collection: Option2 +3.4.3 Functional dependency propagation - option 1 +3.4.4 Functional dependency propagation - option 2 +4. Removal operation properties +5. Removal operation +6. User interface +6.1 @@optimizer_switch flag +6.2 EXPLAIN [EXTENDED] +7. Miscellaneous adjustments +7.1 Fix used_tables() of aggregate functions +7.2 Make subquery predicates collect their outer references +8. Other concerns +8.1 Relationship with outer->inner joins converter +8.2 Relationship with prepared statements +8.3 Relationship with constant table detection +9. Tests and benchmarks </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. -1. Conditions for removal -------------------------- -We can eliminate an inner side of outer join if: -1. For each record combination of outer tables, it will always produce - exactly one record. -2. There are no references to columns of the inner tables anywhere else in +1. Elimination criteria +======================= +We can eliminate inner side of an outer join nest if: + +1. There are no references to columns of the inner tables anywhere else in the query. +2. For each record combination of outer tables, it will always produce + exactly one matching record combination. + +Most of effort in this WL entry is checking these two conditions. -#1 means that every table inside the outer join nest is: - - is a constant table: - = because it can be accessed via eq_ref(const) access, or - = it is a zero-rows or one-row MyISAM-like table [MARK1] - - has an eq_ref access method candidate. - -#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY, - GROUP BY and HAVING do not refer to the inner tables of the outer join - nest. - -1.1 Quick check if there are candidates -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Before we start to enumerate join nests, here is a quick way to check if -there *can be* something to be removed: +2. No outside references check +============================== +Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent +outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner +tables of the outer join nest we're trying to remove. + +For multi-table UPDATE/DELETE we also must not remove tables that we're +updating/deleting from or tables that are used in UPDATE's SET clause. + +2.1 Quick check if there are tables with no outside references +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Before we start searching for outer join nests that could be eliminated, +we'll do a quick and cheap check if there possibly could be something that +could be eliminated: - if ((tables used in select_list | + if (there are outer joins && + (tables used in select_list | tables used in group/order by UNION | - tables used in where) != bitmap_of_all_tables) + tables used in where) != bitmap_of_all_join_tables) { attempt table elimination; } -2. Removal operation properties -------------------------------- -* There is always one way to remove (no choice to remove either this or that) -* It is always better to remove as much tables as possible (at least within - our cost model). -Thus, no need for any cost calculations/etc. It's an unconditional rewrite. -3. Removal operation --------------------- -* Remove the outer join nest's nested join structure (i.e. get the - outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, - $OJ->embedding->nested_join. Update table_map's of all ancestor nested - joins). [MARK2] +3. One-match check +================== +We can eliminate inner side of outer join if it will always generate exactly +one matching record combination. -* Move the tables and their JOIN_TABs to front like it is done with const - tables, with exception that if eliminated outer join nest was within - another outer join nest, that shouldn't prevent us from moving away the - eliminated tables. +By definition of OUTER JOIN, a NULL-complemented record combination will be +generated when the inner side of outer join has not produced any matches. -* Update join->table_count and all-join-tables bitmap. +What remains to be checked is that there is no possiblity that inner side of +the outer join could produce more than one matching record combination. -* That's it. Nothing else? +We'll refer to one-match property as "functional dependency": -4. User interface ------------------ -* We'll add an @@optimizer switch flag for table elimination. Tentative - name: 'table_elimination'. - (Note ^^ utility of the above questioned ^, as table elimination can never - be worse than no elimination. We're leaning towards not adding the flag) - -* EXPLAIN will not show the removed tables at all. This will allow to check - if tables were removed, and also will behave nicely with anchor model and - VIEWs: stuff that user doesn't care about just won't be there. +- A outer join nest is functionally dependent [wrt outer tables] if it will + produce one matching record combination per each record combination of + outer tables -5. Tests and benchmarks ------------------------ -Create a benchmark in sql-bench which checks if the DBMS has table -elimination. -[According to Monty] Run - - queries that would use elimination - - queries that are very similar to one above (so that they would have same - QEP, execution cost, etc) but cannot use table elimination. -then compare run times and make a conclusion about whether dbms supports table -elimination. +- A table is functionally dependent wrt certain set of dependency tables, if + record combination of dependency tables uniquely identifies zero or one + matching record in the table -6. Todo, issues to resolve --------------------------- +- Definitions of functional dependency of keys (=column tuples) and columns are + apparent. -6.1 To resolve -~~~~~~~~~~~~~~ -- Relationship with prepared statements. - On one hand, it's natural to desire to make table elimination a - once-per-statement operation, like outer->inner join conversion. We'll have - to limit the applicability by removing [MARK1] as that can change during - lifetime of the statement. - - The other option is to do table elimination every time. This will require to - rework operation [MARK2] to be undoable. - - I'm leaning towards doing the former. With anchor modeling, it is unlikely - that we'll meet outer joins which have N inner tables of which some are 1-row - MyISAM tables that do not have primary key. - -6.2 Resolved -~~~~~~~~~~~~ -* outer->inner join conversion is not a problem for table elimination. - We make outer->inner conversions based on predicates in WHERE. If the WHERE - referred to an inner table (requirement for OJ->IJ conversion) then table - elimination would not be applicable anyway. - -* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - - affected tables must not be eliminated - - tables that are used on the right side of the SET x=y assignments must - not be eliminated either. +Our goal is to prove that the entire join nest is functionally-dependent. -* Aggregate functions used to report that they depend on all tables, that is, +Join nest is functionally dependent (on the otside tables) if each of its +elements (those can be either base tables or join nests) is functionally +dependent. - item_agg_func->used_tables() == (1ULL << join->tables) - 1 +Functional dependency is transitive: if table A is f-dependent on the outer +tables and table B is f.dependent on {A, outer_tables} then B is functionally +dependent on the outer tables. + +Subsequent sections list cases when we can declare a table to be +functionally-dependent. + +3.1 Functional dependency source #1: Potential eq_ref access +------------------------------------------------------------ +This is the most practically-important case. Taking the example from the HLD +of this WL entry: + + select + A.colA + from + tableA A + left outer join + tableB B + on + B.id = A.id; - always. Fixed it, now aggregate function reports it depends on - tables that its arguments depend on. In particular, COUNT(*) reports - that it depends on no tables (item_count_star->used_tables()==0). - One consequence of that is that "item->used_tables()==0" is not - equivalent to "item->const_item()==true" anymore (not sure if it's - "anymore" or this has been already happening). - -* EXPLAIN EXTENDED warning text was generated after the JOIN object has - been discarded. This didn't allow to use information about join plan - when printing the warning. Fixed this by keeping the JOIN objects until - we've printed the warning (have also an intent to remove the const - tables from the join output). - -7. Additional issues --------------------- -* We remove ON clauses within outer join nests. If these clauses contain - subqueries, they probably should be gone from EXPLAIN output also? - Yes. Current approach: when removing an outer join nest, walk the ON clause - and mark subselects as eliminated. Then let EXPLAIN code check if the - SELECT was eliminated before the printing (EXPLAIN is generated by doing - a recursive descent, so the check will also cause children of eliminated - selects not to be printed) - -* Table elimination is performed after constant table detection (but before - the range analysis). Constant tables are technically different from - eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). - Considering we've already done the join_read_const_table() call, is there any - real difference between constant table and eliminated one? If there is, should - we mark const tables also as eliminated? - from user/EXPLAIN point of view: no. constant table is the one that we read - one record from. eliminated table is the one that we don't acccess at all. +and generalizing it: a table TBL is functionally-dependent if the ON +expression allows to construct a potential eq_ref access to table TBL that +uses only outer or functionally-dependent tables. + +In other words: table TBL will have one match if the ON expression can be +converted into this form + + TBL.unique_key=func(one_match_tables) AND .. remainder ... + +(with appropriate extension for multi-part keys), where + + one_match_tables= { + tables that are not on the inner side of the outer join in question, and + functionally dependent tables + } + +Note that this will cover constant tables, except those that are constant because +they have 0/1 record or are partitioned and have no used partitions. + + +3.2 Functional dependency source #2: col2=func(col1) +---------------------------------------------------- +This comes from the second example in the HLS: -* What is described above will not be able to eliminate this outer join create unique index idx on tableB (id, fromDate); ... left outer join @@ -169,32 +152,331 @@ B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); - This is because condition "B.fromDate= func(tableB)" cannot be used. - Reason#1: update_ref_and_keys() does not consider such conditions to - be of any use (and indeed they are not usable for ref access) - so they are not put into KEYUSE array. - Reason#2: even if they were put there, we would need to be able to tell - between predicates like - B.fromDate= func(B.id) // guarantees only one matching row as - // B.id is already bound by B.id=A.id - // hence B.fromDate becomes bound too. - and - "B.fromDate= func(B.*)" // Can potentially have many matching - // records. - We need to - - Have update_ref_and_keys() create KEYUSE elements for such equalities - - Have eliminate_tables() and friends make a more accurate check. - The right check is to check whether all parts of a unique key are bound. - If we have keypartX to be bound, then t.keypartY=func(keypartX) makes - keypartY to be bound. - The difficulty here is that correlated subquery predicate cannot tell what - columns it depends on (it only remembers tables). - Traversing the predicate is expensive and complicated. - We're leaning towards making each subquery predicate have a List<Item> with - items that - - are in the current select - - and it depends on. - This list will be useful in certain other subquery optimizations as well, - it is cheap to collect it in fix_fields() phase, so it will be collected - for every subquery predicate. +Here it is apparent that tableB can be eliminated. It is not possible to +construct eq_ref access to tableB, though, because for the second part of the +primary key (fromDate column) we only got a condition in this form: + + B.fromDate= func(tableB) + +(we write "func(tableB)" because ref optimizer can only determine which tables +the right part of the equality depends on). + +In general case, equality like this doesn't guarantee functional dependency. +For example, if func() == { return fromDate;}, i.e the ON expression is + + ... ON B.id = A.id and B.fromDate = B.fromDate + +then that would allow table B to have multiple matches per record of table A. + +In order to be able to distinguish between these two cases, we'll need to go +down to column level: + +- A table is functionally dependent if it has a unique key that's functionally + dependent + +- A unique key is functionally dependent when all of its columns are + functionally dependent + +- A table column is functionally dependent if the ON clause allows to extract + an AND-part in this form: + + tbl.column = f(functionally-dependent columns or columns of outer tables) + +3.3 Functional dependency source #3: One or zero records in the table +--------------------------------------------------------------------- +A table with one or zero records cannot generate more than one matching +record. This source is of lesser importance as one/zero-record tables are only +MyISAM tables. + +3.4 Functional dependency check implementation +---------------------------------------------- +As shown above, we need something similar to KEYUSE structures, but not +exactly that (we need things that current ref optimizer considers unusable and +don't need things that it considers usable). + +3.4.1 Equality collection: Option1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We could +- extend KEYUSE structures to store all kinds of equalities we need +- change update_ref_and_keys() and co. to collect equalities both for ref + access and for table elimination + = [possibly] Improve [eq_]ref access to be able to use equalities in + form keypart2=func(keypart1) +- process the KEYUSE array both by table elimination and by ref access + optimizer. + ++ This requires less effort. +- Code will have to be changed all over sql_select.cc +- update_ref_and_keys() and co. already do several unrelated things. Hooking + up table elimination will make it even worse. + +3.4.2 Equality collection: Option2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Alternatively, we could process the WHERE clause totally on our own. ++ Table elimination is standalone and easy to detach module. +- Some code duplication with update_ref_and_keys() and co. + +Having got the equalities, we'll to propagate functional dependency property +to unique keys, tables and, ultimately, join nests. + +3.4.3 Functional dependency propagation - option 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Borrow the approach used in constant table detection code: + + do + { + converted= FALSE; + for each table T in join nest + { + if (check_if_functionally_dependent(T)) + converted= TRUE; + } + } while (converted == TRUE); + + check_if_functionally_dependent(T) + { + if (T has eq_ref access based on func_dep_tables) + return TRUE; + + Apply the same do-while loop-based approach to available equalities + T.column1=func(other columns) + to spread the set of functionally-dependent columns. The goal is to get + all columns of a certain unique key to be bound. + } + + +3.4.4 Functional dependency propagation - option 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Analyze the ON expression(s) and build a list of + + tbl.field = expr(...) + +equalities. tbl here is a table that belongs to a join nest that could +potentially be eliminated. + +besides those, add to the list + - An element for each unique key in the table that needs to be eliminated + - An element for each table that needs to be eliminated + - An element for each join nest that can be eliminated (i.e. has no + references from outside). + +Then, setup "reverse dependencies": each element should have pointers to +elements that are functionally dependent on it: + +- "tbl.field=expr(...)" equality is functionally dependent on all fields that + are used in "expr(...)" (here we take into account only fields that belong + to tables that can potentially be eliminated). +- a unique key is dependent on all of its components +- a table is dependent on all of its unique keys +- a join nest is dependent on all tables that it contains + +These pointers are stored in form of one bitmap, such that: + + "X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] ) + +Each object also stores a number of dependencies it needs to be satisfied +before it itself is satisfied: + +- "tbl.field=expr(...)" needs all its underlying fields (if a field is + referenced many times it is counted only once) + +- a unique key needs all of its key parts + +- a table needs only one of its unique keys + +- a join nest needs all of its tables + +(TODO: so what do we do when we've marked a table as constant? We'll need to +update the "field=expr(....)" elements that use fields of that table. And the +problem is that we won't know how much to decrement from the counters of those +elements. + +Solution#1: switch to table_map() based approach. +Solution#2: introduce separate elements for each involved field. + field will depend on its table, + "field=expr" will depend on fields. +) + +Besides the above, let each element have a pointer to another element, so that +we can have a linked list of elements. + +After the above structures have been created, we start the main algorithm. + +The first step is to create a list of functionally-dependent elements. We walk +across array of dependencies and mark those elements that are already bound +(i.e. their dependencies are satisfied). At the moment those immediately-bound +are only "field=expr" dependencies that don't refer to any columns that are +not bound. + +The second step is the loop + + while (bound_list is not empty) + { + Take the first bound element F off the list. + Use the bitmap to find out what other elements depended on it + for each such element E + { + if (E becomes bound after F is bound) + add E to the list; + } + } + +The last step is to walk through elements that represent the join nests. Those +that are bound can be eliminated. + +4. Removal operation properties +=============================== +* There is always one way to remove (no choice to remove either this or that) +* It is always better to remove as much tables as possible (at least within + our cost model). +Thus, no need for any cost calculations/etc. It's an unconditional rewrite. + + +5. Removal operation +==================== +(This depends a lot on whether we make table elimination a one-off rewrite or +conditional) + +At the moment table elimination is re-done for each join re-execution, hence +the removal operation is designed not to modify any statement's permanent +members. + +* Remove the outer join nest's nested join structure (i.e. get the + outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, + $OJ->embedding->nested_join. Update table_map's of all ancestor nested + joins). [MARK2] + +* Move the tables and their JOIN_TABs to the front of join order, like it is + done with const tables, with exception that if eliminated outer join nest + was within another outer join nest, that shouldn't prevent us from moving + away the eliminated tables. + +* Update join->table_count and all-join-tables bitmap. + ^ TODO: not true anymore ^ + +* That's it. Nothing else? + +6. User interface +================= + +6.1 @@optimizer_switch flag +--------------------------- +Argument againist adding the flag: +* It is always better to perform table elimination than not to do it. + +Arguments for the flag: +* It is always theoretically possible that the new code will cause unintended + slowdowns. +* Having the flag is useful for QA and comparative benchmarking. + +Decision so far: add the flag under #ifdef. Make the flag be present in debug +builds. + +6.2 EXPLAIN [EXTENDED] +---------------------- +There are two possible options: +1. Show eliminated tables, like we do with const tables. +2. Do not show eliminated tables. + +We chose option 2, because: +- the table is not accessed at all (besides locking it) +- it is more natural for anchor model user - when he's querying an anchor- + and attributes view, he doesn't care about the unused attributes. + +EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either. + +NOTE: Before this WL, the warning text was generated after all JOIN objects +have been destroyed. This didn't allow to use information about join plan +when printing the warning. We've fixed this by keeping the JOIN objects until +the warning text has been generated. + +Table elimination removes inner sides of outer join, and logically the ON +clause is also removed. If this clause has any subqueries, they will be +also removed from EXPLAIN output. + +An exception to the above is that if we eliminate a derived table, it will +still be shown in EXPLAIN output. This comes from the fact that the FROM +subqueries are evaluated before table elimination is invoked. +TODO: Is the above ok or still remove parts of FROM subqueries? + +7. Miscellaneous adjustments +============================ + +7.1 Fix used_tables() of aggregate functions +-------------------------------------------- +Aggregate functions used to report that they depend on all tables, that is, + + item_agg_func->used_tables() == (1ULL << join->tables) - 1 + +always. Fixed it, now aggregate function reports that it depends on the +tables that its arguments depend on. In particular, COUNT(*) reports that it +depends on no tables (item_count_star->used_tables()==0). One consequence of +that is that "item->used_tables()==0" is not equivalent to +"item->const_item()==true" anymore (not sure if it's "anymore" or this has +been already so for some items). + +7.2 Make subquery predicates collect their outer references +----------------------------------------------------------- +Per-column functional dependency analysis requires us to take a + + tbl.field = func(...) + +equality and tell which columns of which tables are referred from func(...) +expression. For scalar expressions, this is accomplished by Item::walk()-based +traversal. It should be reasonably cheap (the only practical Item that can be +expensive to traverse seems to be a special case of "col IN (const1,const2, +...)". check if we traverse the long list for such items). + +For correlated subqueries, traversal can be expensive, it is cheaper to make +each subquery item have a list of its outer references. The list can be +collected at fix_fields() stage with very little extra cost, and then it could +be used for other optimizations. + + +8. Other concerns +================= + +8.1 Relationship with outer->inner joins converter +-------------------------------------------------- +One could suspect that outer->inner join conversion could get in the way +of table elimination by changing outer joins (which could be eliminated) +to inner (which we will not try to eliminate). +This concern is not valid: we make outer->inner conversions based on +predicates in WHERE. If the WHERE referred to an inner table (this is a +requirement for the conversion) then table elimination would not be +applicable anyway. + +8.2 Relationship with prepared statements +----------------------------------------- +On one hand, it's natural to desire to make table elimination a +once-per-statement operation, like outer->inner join conversion. We'll have +to limit the applicability by removing [MARK1] as that can change during +lifetime of the statement. + +The other option is to do table elimination every time. This will require to +rework operation [MARK2] to be undoable. + + +8.3 Relationship with constant table detection +---------------------------------------------- +Table elimination is performed after constant table detection (but before +the range analysis). Constant tables are technically different from +eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). +Considering we've already done the join_read_const_table() call, is there any +real difference between constant table and eliminated one? If there is, should +we mark const tables also as eliminated? +from user/EXPLAIN point of view: no. constant table is the one that we read +one record from. eliminated table is the one that we don't acccess at all. +TODO + +9. Tests and benchmarks +======================= +Create a benchmark in sql-bench which checks if the DBMS has table +elimination. +[According to Monty] Run + - query Q1 that would use elimination + - query Q2 that is very similar to Q1 (so that they would have same + QEP, execution cost, etc) but cannot use table elimination. +then compare run times and make a conclusion about whether the used dbms +supports table elimination. -=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=- Dependency created: 29 now depends on 17 -=-=(Monty - Thu, 23 Jul 2009, 09:19)=-=- Version updated. --- /tmp/wklog.17.old.24090 2009-07-23 09:19:32.000000000 +0300 +++ /tmp/wklog.17.new.24090 2009-07-23 09:19:32.000000000 +0300 @@ -1 +1 @@ -Server-9.x +Server-5.1 -=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=- deukje weg Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours). -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300 +++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300 @@ -1 +1 @@ -9.x +Server-9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Category updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-Sprint +Client-BackLog -=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=- Low Level Design modified. --- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300 +++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300 @@ -158,3 +158,43 @@ from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. +* What is described above will not be able to eliminate this outer join + create unique index idx on tableB (id, fromDate); + ... + left outer join + tableB B + on + B.id = A.id + and + B.fromDate = (select max(sub.fromDate) + from tableB sub where sub.id = A.id); + + This is because condition "B.fromDate= func(tableB)" cannot be used. + Reason#1: update_ref_and_keys() does not consider such conditions to + be of any use (and indeed they are not usable for ref access) + so they are not put into KEYUSE array. + Reason#2: even if they were put there, we would need to be able to tell + between predicates like + B.fromDate= func(B.id) // guarantees only one matching row as + // B.id is already bound by B.id=A.id + // hence B.fromDate becomes bound too. + and + "B.fromDate= func(B.*)" // Can potentially have many matching + // records. + We need to + - Have update_ref_and_keys() create KEYUSE elements for such equalities + - Have eliminate_tables() and friends make a more accurate check. + The right check is to check whether all parts of a unique key are bound. + If we have keypartX to be bound, then t.keypartY=func(keypartX) makes + keypartY to be bound. + The difficulty here is that correlated subquery predicate cannot tell what + columns it depends on (it only remembers tables). + Traversing the predicate is expensive and complicated. + We're leaning towards making each subquery predicate have a List<Item> with + items that + - are in the current select + - and it depends on. + This list will be useful in certain other subquery optimizations as well, + it is cheap to collect it in fix_fields() phase, so it will be collected + for every subquery predicate. + ------------------------------------------------------------ -=-=(View All Progress Notes, 28 total)=-=- http://askmonty.org/worklog/index.pl?tid=17&nolimit=1 DESCRIPTION: Eliminate not needed tables from SELECT queries.. This will speed up some views and automatically generated queries. Example: CREATE TABLE B (id int primary key); select A.colA from tableA A left outer join tableB B on B.id = A.id; In this case we can remove table B and the join from the query. HIGH-LEVEL SPECIFICATION: Here is an extended explanation of table elimination. Table elimination is a feature found in some modern query optimizers, of which Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has also been confirmed to use table elimination but not to the same extent. Basically, what table elimination does, is to remove tables from the execution plan when it is unnecessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query: select A.colA from tableA A left outer join tableB B on B.id = A.id; When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain it's original value. The not seen B.* row would contain all NULL:s. However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution. Instead, let's say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan. Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails. select A.colA from tableA A left outer join tableB B on B.id = A.id and B.fromDate = ( select max(sub.fromDate) from tableB sub where sub.id = A.id ); In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate). Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan. SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data. LOW-LEVEL DESIGN: The code (currently in development) is at lp: ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> 1. Elimination criteria 2. No outside references check 2.1 Quick check if there are tables with no outside references 3. One-match check 3.1 Functional dependency source #1: Potential eq_ref access 3.2 Functional dependency source #2: col2=func(col1) 3.3 Functional dependency source #3: One or zero records in the table 3.4 Functional dependency check implementation 3.4.1 Equality collection: Option1 3.4.2 Equality collection: Option2 3.4.3 Functional dependency propagation - option 1 3.4.4 Functional dependency propagation - option 2 4. Removal operation properties 5. Removal operation 6. User interface 6.1 @@optimizer_switch flag 6.2 EXPLAIN [EXTENDED] 7. Miscellaneous adjustments 7.1 Fix used_tables() of aggregate functions 7.2 Make subquery predicates collect their outer references 8. Other concerns 8.1 Relationship with outer->inner joins converter 8.2 Relationship with prepared statements 8.3 Relationship with constant table detection 9. Tests and benchmarks </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. 1. Elimination criteria ======================= We can eliminate inner side of an outer join nest if: 1. There are no references to columns of the inner tables anywhere else in the query. 2. For each record combination of outer tables, it will always produce exactly one matching record combination. Most of effort in this WL entry is checking these two conditions. 2. No outside references check ============================== Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner tables of the outer join nest we're trying to remove. For multi-table UPDATE/DELETE we also must not remove tables that we're updating/deleting from or tables that are used in UPDATE's SET clause. 2.1 Quick check if there are tables with no outside references ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start searching for outer join nests that could be eliminated, we'll do a quick and cheap check if there possibly could be something that could be eliminated: if (there are outer joins && (tables used in select_list | tables used in group/order by UNION | tables used in where) != bitmap_of_all_join_tables) { attempt table elimination; } 3. One-match check ================== We can eliminate inner side of outer join if it will always generate exactly one matching record combination. By definition of OUTER JOIN, a NULL-complemented record combination will be generated when the inner side of outer join has not produced any matches. What remains to be checked is that there is no possiblity that inner side of the outer join could produce more than one matching record combination. We'll refer to one-match property as "functional dependency": - A outer join nest is functionally dependent [wrt outer tables] if it will produce one matching record combination per each record combination of outer tables - A table is functionally dependent wrt certain set of dependency tables, if record combination of dependency tables uniquely identifies zero or one matching record in the table - Definitions of functional dependency of keys (=column tuples) and columns are apparent. Our goal is to prove that the entire join nest is functionally-dependent. Join nest is functionally dependent (on the otside tables) if each of its elements (those can be either base tables or join nests) is functionally dependent. Functional dependency is transitive: if table A is f-dependent on the outer tables and table B is f.dependent on {A, outer_tables} then B is functionally dependent on the outer tables. Subsequent sections list cases when we can declare a table to be functionally-dependent. 3.1 Functional dependency source #1: Potential eq_ref access ------------------------------------------------------------ This is the most practically-important case. Taking the example from the HLD of this WL entry: select A.colA from tableA A left outer join tableB B on B.id = A.id; and generalizing it: a table TBL is functionally-dependent if the ON expression allows to construct a potential eq_ref access to table TBL that uses only outer or functionally-dependent tables. In other words: table TBL will have one match if the ON expression can be converted into this form TBL.unique_key=func(one_match_tables) AND .. remainder ... (with appropriate extension for multi-part keys), where one_match_tables= { tables that are not on the inner side of the outer join in question, and functionally dependent tables } Note that this will cover constant tables, except those that are constant because they have 0/1 record or are partitioned and have no used partitions. 3.2 Functional dependency source #2: col2=func(col1) ---------------------------------------------------- This comes from the second example in the HLS: create unique index idx on tableB (id, fromDate); ... left outer join tableB B on B.id = A.id and B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); Here it is apparent that tableB can be eliminated. It is not possible to construct eq_ref access to tableB, though, because for the second part of the primary key (fromDate column) we only got a condition in this form: B.fromDate= func(tableB) (we write "func(tableB)" because ref optimizer can only determine which tables the right part of the equality depends on). In general case, equality like this doesn't guarantee functional dependency. For example, if func() == { return fromDate;}, i.e the ON expression is ... ON B.id = A.id and B.fromDate = B.fromDate then that would allow table B to have multiple matches per record of table A. In order to be able to distinguish between these two cases, we'll need to go down to column level: - A table is functionally dependent if it has a unique key that's functionally dependent - A unique key is functionally dependent when all of its columns are functionally dependent - A table column is functionally dependent if the ON clause allows to extract an AND-part in this form: tbl.column = f(functionally-dependent columns or columns of outer tables) 3.3 Functional dependency source #3: One or zero records in the table --------------------------------------------------------------------- A table with one or zero records cannot generate more than one matching record. This source is of lesser importance as one/zero-record tables are only MyISAM tables. 3.4 Functional dependency check implementation ---------------------------------------------- As shown above, we need something similar to KEYUSE structures, but not exactly that (we need things that current ref optimizer considers unusable and don't need things that it considers usable). 3.4.1 Equality collection: Option1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We could - extend KEYUSE structures to store all kinds of equalities we need - change update_ref_and_keys() and co. to collect equalities both for ref access and for table elimination = [possibly] Improve [eq_]ref access to be able to use equalities in form keypart2=func(keypart1) - process the KEYUSE array both by table elimination and by ref access optimizer. + This requires less effort. - Code will have to be changed all over sql_select.cc - update_ref_and_keys() and co. already do several unrelated things. Hooking up table elimination will make it even worse. 3.4.2 Equality collection: Option2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alternatively, we could process the WHERE clause totally on our own. + Table elimination is standalone and easy to detach module. - Some code duplication with update_ref_and_keys() and co. Having got the equalities, we'll to propagate functional dependency property to unique keys, tables and, ultimately, join nests. 3.4.3 Functional dependency propagation - option 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Borrow the approach used in constant table detection code: do { converted= FALSE; for each table T in join nest { if (check_if_functionally_dependent(T)) converted= TRUE; } } while (converted == TRUE); check_if_functionally_dependent(T) { if (T has eq_ref access based on func_dep_tables) return TRUE; Apply the same do-while loop-based approach to available equalities T.column1=func(other columns) to spread the set of functionally-dependent columns. The goal is to get all columns of a certain unique key to be bound. } 3.4.4 Functional dependency propagation - option 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Analyze the ON expression(s) and build a list of tbl.field = expr(...) equalities. tbl here is a table that belongs to a join nest that could potentially be eliminated. besides those, add to the list - An element for each unique key in the table that needs to be eliminated - An element for each table that needs to be eliminated - An element for each join nest that can be eliminated (i.e. has no references from outside). Then, setup "reverse dependencies": each element should have pointers to elements that are functionally dependent on it: - "tbl.field=expr(...)" equality is functionally dependent on all fields that are used in "expr(...)" (here we take into account only fields that belong to tables that can potentially be eliminated). - a unique key is dependent on all of its components - a table is dependent on all of its unique keys - a join nest is dependent on all tables that it contains These pointers are stored in form of one bitmap, such that: "X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] ) Each object also stores a number of dependencies it needs to be satisfied before it itself is satisfied: - "tbl.field=expr(...)" needs all its underlying fields (if a field is referenced many times it is counted only once) - a unique key needs all of its key parts - a table needs only one of its unique keys - a join nest needs all of its tables (TODO: so what do we do when we've marked a table as constant? We'll need to update the "field=expr(....)" elements that use fields of that table. And the problem is that we won't know how much to decrement from the counters of those elements. Solution#1: switch to table_map() based approach. Solution#2: introduce separate elements for each involved field. field will depend on its table, "field=expr" will depend on fields. ) Besides the above, let each element have a pointer to another element, so that we can have a linked list of elements. After the above structures have been created, we start the main algorithm. The first step is to create a list of functionally-dependent elements. We walk across array of dependencies and mark those elements that are already bound (i.e. their dependencies are satisfied). At the moment those immediately-bound are only "field=expr" dependencies that don't refer to any columns that are not bound. The second step is the loop while (bound_list is not empty) { Take the first bound element F off the list. Use the bitmap to find out what other elements depended on it for each such element E { if (E becomes bound after F is bound) add E to the list; } } The last step is to walk through elements that represent the join nests. Those that are bound can be eliminated. 4. Removal operation properties =============================== * There is always one way to remove (no choice to remove either this or that) * It is always better to remove as much tables as possible (at least within our cost model). Thus, no need for any cost calculations/etc. It's an unconditional rewrite. 5. Removal operation ==================== (This depends a lot on whether we make table elimination a one-off rewrite or conditional) At the moment table elimination is re-done for each join re-execution, hence the removal operation is designed not to modify any statement's permanent members. * Remove the outer join nest's nested join structure (i.e. get the outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, $OJ->embedding->nested_join. Update table_map's of all ancestor nested joins). [MARK2] * Move the tables and their JOIN_TABs to the front of join order, like it is done with const tables, with exception that if eliminated outer join nest was within another outer join nest, that shouldn't prevent us from moving away the eliminated tables. * Update join->table_count and all-join-tables bitmap. ^ TODO: not true anymore ^ * That's it. Nothing else? 6. User interface ================= 6.1 @@optimizer_switch flag --------------------------- Argument againist adding the flag: * It is always better to perform table elimination than not to do it. Arguments for the flag: * It is always theoretically possible that the new code will cause unintended slowdowns. * Having the flag is useful for QA and comparative benchmarking. Decision so far: add the flag under #ifdef. Make the flag be present in debug builds. 6.2 EXPLAIN [EXTENDED] ---------------------- There are two possible options: 1. Show eliminated tables, like we do with const tables. 2. Do not show eliminated tables. We chose option 2, because: - the table is not accessed at all (besides locking it) - it is more natural for anchor model user - when he's querying an anchor- and attributes view, he doesn't care about the unused attributes. EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either. NOTE: Before this WL, the warning text was generated after all JOIN objects have been destroyed. This didn't allow to use information about join plan when printing the warning. We've fixed this by keeping the JOIN objects until the warning text has been generated. Table elimination removes inner sides of outer join, and logically the ON clause is also removed. If this clause has any subqueries, they will be also removed from EXPLAIN output. An exception to the above is that if we eliminate a derived table, it will still be shown in EXPLAIN output. This comes from the fact that the FROM subqueries are evaluated before table elimination is invoked. TODO: Is the above ok or still remove parts of FROM subqueries? 7. Miscellaneous adjustments ============================ 7.1 Fix used_tables() of aggregate functions -------------------------------------------- Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 always. Fixed it, now aggregate function reports that it depends on the tables that its arguments depend on. In particular, COUNT(*) reports that it depends on no tables (item_count_star->used_tables()==0). One consequence of that is that "item->used_tables()==0" is not equivalent to "item->const_item()==true" anymore (not sure if it's "anymore" or this has been already so for some items). 7.2 Make subquery predicates collect their outer references ----------------------------------------------------------- Per-column functional dependency analysis requires us to take a tbl.field = func(...) equality and tell which columns of which tables are referred from func(...) expression. For scalar expressions, this is accomplished by Item::walk()-based traversal. It should be reasonably cheap (the only practical Item that can be expensive to traverse seems to be a special case of "col IN (const1,const2, ...)". check if we traverse the long list for such items). For correlated subqueries, traversal can be expensive, it is cheaper to make each subquery item have a list of its outer references. The list can be collected at fix_fields() stage with very little extra cost, and then it could be used for other optimizations. 8. Other concerns ================= 8.1 Relationship with outer->inner joins converter -------------------------------------------------- One could suspect that outer->inner join conversion could get in the way of table elimination by changing outer joins (which could be eliminated) to inner (which we will not try to eliminate). This concern is not valid: we make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (this is a requirement for the conversion) then table elimination would not be applicable anyway. 8.2 Relationship with prepared statements ----------------------------------------- On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have to limit the applicability by removing [MARK1] as that can change during lifetime of the statement. The other option is to do table elimination every time. This will require to rework operation [MARK2] to be undoable. 8.3 Relationship with constant table detection ---------------------------------------------- Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. TODO 9. Tests and benchmarks ======================= Create a benchmark in sql-bench which checks if the DBMS has table elimination. [According to Monty] Run - query Q1 that would use elimination - query Q2 that is very similar to Q1 (so that they would have same QEP, execution cost, etc) but cannot use table elimination. then compare run times and make a conclusion about whether the used dbms supports table elimination. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Table elimination (17)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Table elimination CREATION DATE..: Sun, 10 May 2009, 19:57 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-Sprint TASK ID........: 17 (http://askmonty.org/worklog/?tid=17) VERSION........: 9.x STATUS.........: In-Progress PRIORITY.......: 60 WORKED HOURS...: 1 ESTIMATE.......: 3 (hours remain) ORIG. ESTIMATE.: 3 PROGRESS NOTES: -=-=(Guest - Sun, 16 Aug 2009, 16:16)=-=- Category updated. --- /tmp/wklog.17.old.24882 2009-08-16 16:16:49.000000000 +0300 +++ /tmp/wklog.17.new.24882 2009-08-16 16:16:49.000000000 +0300 @@ -1 +1 @@ -Client-BackLog +Server-Sprint -=-=(Guest - Sun, 16 Aug 2009, 16:16)=-=- Version updated. --- /tmp/wklog.17.old.24882 2009-08-16 16:16:49.000000000 +0300 +++ /tmp/wklog.17.new.24882 2009-08-16 16:16:49.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Wed, 29 Jul 2009, 21:41)=-=- Low Level Design modified. --- /tmp/wklog.17.old.26011 2009-07-29 21:41:04.000000000 +0300 +++ /tmp/wklog.17.new.26011 2009-07-29 21:41:04.000000000 +0300 @@ -2,163 +2,146 @@ ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> -1. Conditions for removal -1.1 Quick check if there are candidates -2. Removal operation properties -3. Removal operation -4. User interface -5. Tests and benchmarks -6. Todo, issues to resolve -6.1 To resolve -6.2 Resolved -7. Additional issues +1. Elimination criteria +2. No outside references check +2.1 Quick check if there are tables with no outside references +3. One-match check +3.1 Functional dependency source #1: Potential eq_ref access +3.2 Functional dependency source #2: col2=func(col1) +3.3 Functional dependency source #3: One or zero records in the table +3.4 Functional dependency check implementation +3.4.1 Equality collection: Option1 +3.4.2 Equality collection: Option2 +3.4.3 Functional dependency propagation - option 1 +3.4.4 Functional dependency propagation - option 2 +4. Removal operation properties +5. Removal operation +6. User interface +6.1 @@optimizer_switch flag +6.2 EXPLAIN [EXTENDED] +7. Miscellaneous adjustments +7.1 Fix used_tables() of aggregate functions +7.2 Make subquery predicates collect their outer references +8. Other concerns +8.1 Relationship with outer->inner joins converter +8.2 Relationship with prepared statements +8.3 Relationship with constant table detection +9. Tests and benchmarks </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. -1. Conditions for removal -------------------------- -We can eliminate an inner side of outer join if: -1. For each record combination of outer tables, it will always produce - exactly one record. -2. There are no references to columns of the inner tables anywhere else in +1. Elimination criteria +======================= +We can eliminate inner side of an outer join nest if: + +1. There are no references to columns of the inner tables anywhere else in the query. +2. For each record combination of outer tables, it will always produce + exactly one matching record combination. + +Most of effort in this WL entry is checking these two conditions. -#1 means that every table inside the outer join nest is: - - is a constant table: - = because it can be accessed via eq_ref(const) access, or - = it is a zero-rows or one-row MyISAM-like table [MARK1] - - has an eq_ref access method candidate. - -#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY, - GROUP BY and HAVING do not refer to the inner tables of the outer join - nest. - -1.1 Quick check if there are candidates -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Before we start to enumerate join nests, here is a quick way to check if -there *can be* something to be removed: +2. No outside references check +============================== +Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent +outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner +tables of the outer join nest we're trying to remove. + +For multi-table UPDATE/DELETE we also must not remove tables that we're +updating/deleting from or tables that are used in UPDATE's SET clause. + +2.1 Quick check if there are tables with no outside references +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Before we start searching for outer join nests that could be eliminated, +we'll do a quick and cheap check if there possibly could be something that +could be eliminated: - if ((tables used in select_list | + if (there are outer joins && + (tables used in select_list | tables used in group/order by UNION | - tables used in where) != bitmap_of_all_tables) + tables used in where) != bitmap_of_all_join_tables) { attempt table elimination; } -2. Removal operation properties -------------------------------- -* There is always one way to remove (no choice to remove either this or that) -* It is always better to remove as much tables as possible (at least within - our cost model). -Thus, no need for any cost calculations/etc. It's an unconditional rewrite. -3. Removal operation --------------------- -* Remove the outer join nest's nested join structure (i.e. get the - outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, - $OJ->embedding->nested_join. Update table_map's of all ancestor nested - joins). [MARK2] +3. One-match check +================== +We can eliminate inner side of outer join if it will always generate exactly +one matching record combination. -* Move the tables and their JOIN_TABs to front like it is done with const - tables, with exception that if eliminated outer join nest was within - another outer join nest, that shouldn't prevent us from moving away the - eliminated tables. +By definition of OUTER JOIN, a NULL-complemented record combination will be +generated when the inner side of outer join has not produced any matches. -* Update join->table_count and all-join-tables bitmap. +What remains to be checked is that there is no possiblity that inner side of +the outer join could produce more than one matching record combination. -* That's it. Nothing else? +We'll refer to one-match property as "functional dependency": -4. User interface ------------------ -* We'll add an @@optimizer switch flag for table elimination. Tentative - name: 'table_elimination'. - (Note ^^ utility of the above questioned ^, as table elimination can never - be worse than no elimination. We're leaning towards not adding the flag) - -* EXPLAIN will not show the removed tables at all. This will allow to check - if tables were removed, and also will behave nicely with anchor model and - VIEWs: stuff that user doesn't care about just won't be there. +- A outer join nest is functionally dependent [wrt outer tables] if it will + produce one matching record combination per each record combination of + outer tables -5. Tests and benchmarks ------------------------ -Create a benchmark in sql-bench which checks if the DBMS has table -elimination. -[According to Monty] Run - - queries that would use elimination - - queries that are very similar to one above (so that they would have same - QEP, execution cost, etc) but cannot use table elimination. -then compare run times and make a conclusion about whether dbms supports table -elimination. +- A table is functionally dependent wrt certain set of dependency tables, if + record combination of dependency tables uniquely identifies zero or one + matching record in the table -6. Todo, issues to resolve --------------------------- +- Definitions of functional dependency of keys (=column tuples) and columns are + apparent. -6.1 To resolve -~~~~~~~~~~~~~~ -- Relationship with prepared statements. - On one hand, it's natural to desire to make table elimination a - once-per-statement operation, like outer->inner join conversion. We'll have - to limit the applicability by removing [MARK1] as that can change during - lifetime of the statement. - - The other option is to do table elimination every time. This will require to - rework operation [MARK2] to be undoable. - - I'm leaning towards doing the former. With anchor modeling, it is unlikely - that we'll meet outer joins which have N inner tables of which some are 1-row - MyISAM tables that do not have primary key. - -6.2 Resolved -~~~~~~~~~~~~ -* outer->inner join conversion is not a problem for table elimination. - We make outer->inner conversions based on predicates in WHERE. If the WHERE - referred to an inner table (requirement for OJ->IJ conversion) then table - elimination would not be applicable anyway. - -* For Multi-table UPDATEs/DELETEs, need to also analyze the SET clause: - - affected tables must not be eliminated - - tables that are used on the right side of the SET x=y assignments must - not be eliminated either. +Our goal is to prove that the entire join nest is functionally-dependent. -* Aggregate functions used to report that they depend on all tables, that is, +Join nest is functionally dependent (on the otside tables) if each of its +elements (those can be either base tables or join nests) is functionally +dependent. - item_agg_func->used_tables() == (1ULL << join->tables) - 1 +Functional dependency is transitive: if table A is f-dependent on the outer +tables and table B is f.dependent on {A, outer_tables} then B is functionally +dependent on the outer tables. + +Subsequent sections list cases when we can declare a table to be +functionally-dependent. + +3.1 Functional dependency source #1: Potential eq_ref access +------------------------------------------------------------ +This is the most practically-important case. Taking the example from the HLD +of this WL entry: + + select + A.colA + from + tableA A + left outer join + tableB B + on + B.id = A.id; - always. Fixed it, now aggregate function reports it depends on - tables that its arguments depend on. In particular, COUNT(*) reports - that it depends on no tables (item_count_star->used_tables()==0). - One consequence of that is that "item->used_tables()==0" is not - equivalent to "item->const_item()==true" anymore (not sure if it's - "anymore" or this has been already happening). - -* EXPLAIN EXTENDED warning text was generated after the JOIN object has - been discarded. This didn't allow to use information about join plan - when printing the warning. Fixed this by keeping the JOIN objects until - we've printed the warning (have also an intent to remove the const - tables from the join output). - -7. Additional issues --------------------- -* We remove ON clauses within outer join nests. If these clauses contain - subqueries, they probably should be gone from EXPLAIN output also? - Yes. Current approach: when removing an outer join nest, walk the ON clause - and mark subselects as eliminated. Then let EXPLAIN code check if the - SELECT was eliminated before the printing (EXPLAIN is generated by doing - a recursive descent, so the check will also cause children of eliminated - selects not to be printed) - -* Table elimination is performed after constant table detection (but before - the range analysis). Constant tables are technically different from - eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). - Considering we've already done the join_read_const_table() call, is there any - real difference between constant table and eliminated one? If there is, should - we mark const tables also as eliminated? - from user/EXPLAIN point of view: no. constant table is the one that we read - one record from. eliminated table is the one that we don't acccess at all. +and generalizing it: a table TBL is functionally-dependent if the ON +expression allows to construct a potential eq_ref access to table TBL that +uses only outer or functionally-dependent tables. + +In other words: table TBL will have one match if the ON expression can be +converted into this form + + TBL.unique_key=func(one_match_tables) AND .. remainder ... + +(with appropriate extension for multi-part keys), where + + one_match_tables= { + tables that are not on the inner side of the outer join in question, and + functionally dependent tables + } + +Note that this will cover constant tables, except those that are constant because +they have 0/1 record or are partitioned and have no used partitions. + + +3.2 Functional dependency source #2: col2=func(col1) +---------------------------------------------------- +This comes from the second example in the HLS: -* What is described above will not be able to eliminate this outer join create unique index idx on tableB (id, fromDate); ... left outer join @@ -169,32 +152,331 @@ B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); - This is because condition "B.fromDate= func(tableB)" cannot be used. - Reason#1: update_ref_and_keys() does not consider such conditions to - be of any use (and indeed they are not usable for ref access) - so they are not put into KEYUSE array. - Reason#2: even if they were put there, we would need to be able to tell - between predicates like - B.fromDate= func(B.id) // guarantees only one matching row as - // B.id is already bound by B.id=A.id - // hence B.fromDate becomes bound too. - and - "B.fromDate= func(B.*)" // Can potentially have many matching - // records. - We need to - - Have update_ref_and_keys() create KEYUSE elements for such equalities - - Have eliminate_tables() and friends make a more accurate check. - The right check is to check whether all parts of a unique key are bound. - If we have keypartX to be bound, then t.keypartY=func(keypartX) makes - keypartY to be bound. - The difficulty here is that correlated subquery predicate cannot tell what - columns it depends on (it only remembers tables). - Traversing the predicate is expensive and complicated. - We're leaning towards making each subquery predicate have a List<Item> with - items that - - are in the current select - - and it depends on. - This list will be useful in certain other subquery optimizations as well, - it is cheap to collect it in fix_fields() phase, so it will be collected - for every subquery predicate. +Here it is apparent that tableB can be eliminated. It is not possible to +construct eq_ref access to tableB, though, because for the second part of the +primary key (fromDate column) we only got a condition in this form: + + B.fromDate= func(tableB) + +(we write "func(tableB)" because ref optimizer can only determine which tables +the right part of the equality depends on). + +In general case, equality like this doesn't guarantee functional dependency. +For example, if func() == { return fromDate;}, i.e the ON expression is + + ... ON B.id = A.id and B.fromDate = B.fromDate + +then that would allow table B to have multiple matches per record of table A. + +In order to be able to distinguish between these two cases, we'll need to go +down to column level: + +- A table is functionally dependent if it has a unique key that's functionally + dependent + +- A unique key is functionally dependent when all of its columns are + functionally dependent + +- A table column is functionally dependent if the ON clause allows to extract + an AND-part in this form: + + tbl.column = f(functionally-dependent columns or columns of outer tables) + +3.3 Functional dependency source #3: One or zero records in the table +--------------------------------------------------------------------- +A table with one or zero records cannot generate more than one matching +record. This source is of lesser importance as one/zero-record tables are only +MyISAM tables. + +3.4 Functional dependency check implementation +---------------------------------------------- +As shown above, we need something similar to KEYUSE structures, but not +exactly that (we need things that current ref optimizer considers unusable and +don't need things that it considers usable). + +3.4.1 Equality collection: Option1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We could +- extend KEYUSE structures to store all kinds of equalities we need +- change update_ref_and_keys() and co. to collect equalities both for ref + access and for table elimination + = [possibly] Improve [eq_]ref access to be able to use equalities in + form keypart2=func(keypart1) +- process the KEYUSE array both by table elimination and by ref access + optimizer. + ++ This requires less effort. +- Code will have to be changed all over sql_select.cc +- update_ref_and_keys() and co. already do several unrelated things. Hooking + up table elimination will make it even worse. + +3.4.2 Equality collection: Option2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Alternatively, we could process the WHERE clause totally on our own. ++ Table elimination is standalone and easy to detach module. +- Some code duplication with update_ref_and_keys() and co. + +Having got the equalities, we'll to propagate functional dependency property +to unique keys, tables and, ultimately, join nests. + +3.4.3 Functional dependency propagation - option 1 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Borrow the approach used in constant table detection code: + + do + { + converted= FALSE; + for each table T in join nest + { + if (check_if_functionally_dependent(T)) + converted= TRUE; + } + } while (converted == TRUE); + + check_if_functionally_dependent(T) + { + if (T has eq_ref access based on func_dep_tables) + return TRUE; + + Apply the same do-while loop-based approach to available equalities + T.column1=func(other columns) + to spread the set of functionally-dependent columns. The goal is to get + all columns of a certain unique key to be bound. + } + + +3.4.4 Functional dependency propagation - option 2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Analyze the ON expression(s) and build a list of + + tbl.field = expr(...) + +equalities. tbl here is a table that belongs to a join nest that could +potentially be eliminated. + +besides those, add to the list + - An element for each unique key in the table that needs to be eliminated + - An element for each table that needs to be eliminated + - An element for each join nest that can be eliminated (i.e. has no + references from outside). + +Then, setup "reverse dependencies": each element should have pointers to +elements that are functionally dependent on it: + +- "tbl.field=expr(...)" equality is functionally dependent on all fields that + are used in "expr(...)" (here we take into account only fields that belong + to tables that can potentially be eliminated). +- a unique key is dependent on all of its components +- a table is dependent on all of its unique keys +- a join nest is dependent on all tables that it contains + +These pointers are stored in form of one bitmap, such that: + + "X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] ) + +Each object also stores a number of dependencies it needs to be satisfied +before it itself is satisfied: + +- "tbl.field=expr(...)" needs all its underlying fields (if a field is + referenced many times it is counted only once) + +- a unique key needs all of its key parts + +- a table needs only one of its unique keys + +- a join nest needs all of its tables + +(TODO: so what do we do when we've marked a table as constant? We'll need to +update the "field=expr(....)" elements that use fields of that table. And the +problem is that we won't know how much to decrement from the counters of those +elements. + +Solution#1: switch to table_map() based approach. +Solution#2: introduce separate elements for each involved field. + field will depend on its table, + "field=expr" will depend on fields. +) + +Besides the above, let each element have a pointer to another element, so that +we can have a linked list of elements. + +After the above structures have been created, we start the main algorithm. + +The first step is to create a list of functionally-dependent elements. We walk +across array of dependencies and mark those elements that are already bound +(i.e. their dependencies are satisfied). At the moment those immediately-bound +are only "field=expr" dependencies that don't refer to any columns that are +not bound. + +The second step is the loop + + while (bound_list is not empty) + { + Take the first bound element F off the list. + Use the bitmap to find out what other elements depended on it + for each such element E + { + if (E becomes bound after F is bound) + add E to the list; + } + } + +The last step is to walk through elements that represent the join nests. Those +that are bound can be eliminated. + +4. Removal operation properties +=============================== +* There is always one way to remove (no choice to remove either this or that) +* It is always better to remove as much tables as possible (at least within + our cost model). +Thus, no need for any cost calculations/etc. It's an unconditional rewrite. + + +5. Removal operation +==================== +(This depends a lot on whether we make table elimination a one-off rewrite or +conditional) + +At the moment table elimination is re-done for each join re-execution, hence +the removal operation is designed not to modify any statement's permanent +members. + +* Remove the outer join nest's nested join structure (i.e. get the + outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, + $OJ->embedding->nested_join. Update table_map's of all ancestor nested + joins). [MARK2] + +* Move the tables and their JOIN_TABs to the front of join order, like it is + done with const tables, with exception that if eliminated outer join nest + was within another outer join nest, that shouldn't prevent us from moving + away the eliminated tables. + +* Update join->table_count and all-join-tables bitmap. + ^ TODO: not true anymore ^ + +* That's it. Nothing else? + +6. User interface +================= + +6.1 @@optimizer_switch flag +--------------------------- +Argument againist adding the flag: +* It is always better to perform table elimination than not to do it. + +Arguments for the flag: +* It is always theoretically possible that the new code will cause unintended + slowdowns. +* Having the flag is useful for QA and comparative benchmarking. + +Decision so far: add the flag under #ifdef. Make the flag be present in debug +builds. + +6.2 EXPLAIN [EXTENDED] +---------------------- +There are two possible options: +1. Show eliminated tables, like we do with const tables. +2. Do not show eliminated tables. + +We chose option 2, because: +- the table is not accessed at all (besides locking it) +- it is more natural for anchor model user - when he's querying an anchor- + and attributes view, he doesn't care about the unused attributes. + +EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either. + +NOTE: Before this WL, the warning text was generated after all JOIN objects +have been destroyed. This didn't allow to use information about join plan +when printing the warning. We've fixed this by keeping the JOIN objects until +the warning text has been generated. + +Table elimination removes inner sides of outer join, and logically the ON +clause is also removed. If this clause has any subqueries, they will be +also removed from EXPLAIN output. + +An exception to the above is that if we eliminate a derived table, it will +still be shown in EXPLAIN output. This comes from the fact that the FROM +subqueries are evaluated before table elimination is invoked. +TODO: Is the above ok or still remove parts of FROM subqueries? + +7. Miscellaneous adjustments +============================ + +7.1 Fix used_tables() of aggregate functions +-------------------------------------------- +Aggregate functions used to report that they depend on all tables, that is, + + item_agg_func->used_tables() == (1ULL << join->tables) - 1 + +always. Fixed it, now aggregate function reports that it depends on the +tables that its arguments depend on. In particular, COUNT(*) reports that it +depends on no tables (item_count_star->used_tables()==0). One consequence of +that is that "item->used_tables()==0" is not equivalent to +"item->const_item()==true" anymore (not sure if it's "anymore" or this has +been already so for some items). + +7.2 Make subquery predicates collect their outer references +----------------------------------------------------------- +Per-column functional dependency analysis requires us to take a + + tbl.field = func(...) + +equality and tell which columns of which tables are referred from func(...) +expression. For scalar expressions, this is accomplished by Item::walk()-based +traversal. It should be reasonably cheap (the only practical Item that can be +expensive to traverse seems to be a special case of "col IN (const1,const2, +...)". check if we traverse the long list for such items). + +For correlated subqueries, traversal can be expensive, it is cheaper to make +each subquery item have a list of its outer references. The list can be +collected at fix_fields() stage with very little extra cost, and then it could +be used for other optimizations. + + +8. Other concerns +================= + +8.1 Relationship with outer->inner joins converter +-------------------------------------------------- +One could suspect that outer->inner join conversion could get in the way +of table elimination by changing outer joins (which could be eliminated) +to inner (which we will not try to eliminate). +This concern is not valid: we make outer->inner conversions based on +predicates in WHERE. If the WHERE referred to an inner table (this is a +requirement for the conversion) then table elimination would not be +applicable anyway. + +8.2 Relationship with prepared statements +----------------------------------------- +On one hand, it's natural to desire to make table elimination a +once-per-statement operation, like outer->inner join conversion. We'll have +to limit the applicability by removing [MARK1] as that can change during +lifetime of the statement. + +The other option is to do table elimination every time. This will require to +rework operation [MARK2] to be undoable. + + +8.3 Relationship with constant table detection +---------------------------------------------- +Table elimination is performed after constant table detection (but before +the range analysis). Constant tables are technically different from +eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). +Considering we've already done the join_read_const_table() call, is there any +real difference between constant table and eliminated one? If there is, should +we mark const tables also as eliminated? +from user/EXPLAIN point of view: no. constant table is the one that we read +one record from. eliminated table is the one that we don't acccess at all. +TODO + +9. Tests and benchmarks +======================= +Create a benchmark in sql-bench which checks if the DBMS has table +elimination. +[According to Monty] Run + - query Q1 that would use elimination + - query Q2 that is very similar to Q1 (so that they would have same + QEP, execution cost, etc) but cannot use table elimination. +then compare run times and make a conclusion about whether the used dbms +supports table elimination. -=-=(Guest - Thu, 23 Jul 2009, 20:07)=-=- Dependency created: 29 now depends on 17 -=-=(Monty - Thu, 23 Jul 2009, 09:19)=-=- Version updated. --- /tmp/wklog.17.old.24090 2009-07-23 09:19:32.000000000 +0300 +++ /tmp/wklog.17.new.24090 2009-07-23 09:19:32.000000000 +0300 @@ -1 +1 @@ -Server-9.x +Server-5.1 -=-=(Guest - Mon, 20 Jul 2009, 14:28)=-=- deukje weg Worked 1 hour and estimate 3 hours remain (original estimate increased by 4 hours). -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24138 2009-07-17 02:44:49.000000000 +0300 +++ /tmp/wklog.17.new.24138 2009-07-17 02:44:49.000000000 +0300 @@ -1 +1 @@ -9.x +Server-9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Version updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-5.1 +9.x -=-=(Guest - Fri, 17 Jul 2009, 02:44)=-=- Category updated. --- /tmp/wklog.17.old.24114 2009-07-17 02:44:36.000000000 +0300 +++ /tmp/wklog.17.new.24114 2009-07-17 02:44:36.000000000 +0300 @@ -1 +1 @@ -Server-Sprint +Client-BackLog -=-=(Guest - Thu, 18 Jun 2009, 04:15)=-=- Low Level Design modified. --- /tmp/wklog.17.old.29969 2009-06-18 04:15:23.000000000 +0300 +++ /tmp/wklog.17.new.29969 2009-06-18 04:15:23.000000000 +0300 @@ -158,3 +158,43 @@ from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. +* What is described above will not be able to eliminate this outer join + create unique index idx on tableB (id, fromDate); + ... + left outer join + tableB B + on + B.id = A.id + and + B.fromDate = (select max(sub.fromDate) + from tableB sub where sub.id = A.id); + + This is because condition "B.fromDate= func(tableB)" cannot be used. + Reason#1: update_ref_and_keys() does not consider such conditions to + be of any use (and indeed they are not usable for ref access) + so they are not put into KEYUSE array. + Reason#2: even if they were put there, we would need to be able to tell + between predicates like + B.fromDate= func(B.id) // guarantees only one matching row as + // B.id is already bound by B.id=A.id + // hence B.fromDate becomes bound too. + and + "B.fromDate= func(B.*)" // Can potentially have many matching + // records. + We need to + - Have update_ref_and_keys() create KEYUSE elements for such equalities + - Have eliminate_tables() and friends make a more accurate check. + The right check is to check whether all parts of a unique key are bound. + If we have keypartX to be bound, then t.keypartY=func(keypartX) makes + keypartY to be bound. + The difficulty here is that correlated subquery predicate cannot tell what + columns it depends on (it only remembers tables). + Traversing the predicate is expensive and complicated. + We're leaning towards making each subquery predicate have a List<Item> with + items that + - are in the current select + - and it depends on. + This list will be useful in certain other subquery optimizations as well, + it is cheap to collect it in fix_fields() phase, so it will be collected + for every subquery predicate. + ------------------------------------------------------------ -=-=(View All Progress Notes, 28 total)=-=- http://askmonty.org/worklog/index.pl?tid=17&nolimit=1 DESCRIPTION: Eliminate not needed tables from SELECT queries.. This will speed up some views and automatically generated queries. Example: CREATE TABLE B (id int primary key); select A.colA from tableA A left outer join tableB B on B.id = A.id; In this case we can remove table B and the join from the query. HIGH-LEVEL SPECIFICATION: Here is an extended explanation of table elimination. Table elimination is a feature found in some modern query optimizers, of which Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has also been confirmed to use table elimination but not to the same extent. Basically, what table elimination does, is to remove tables from the execution plan when it is unnecessary to include them. This can, of course, only happen if the right circumstances arise. Let us for example look at the following query: select A.colA from tableA A left outer join tableB B on B.id = A.id; When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain it's original value. The not seen B.* row would contain all NULL:s. However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution. Instead, let's say that tableB contains rows that make it possible to place a unique constraint on the column B.id, for example and often the case a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. If further, as in the example query, we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan. Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails. select A.colA from tableA A left outer join tableB B on B.id = A.id and B.fromDate = ( select max(sub.fromDate) from tableB sub where sub.id = A.id ); In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate). Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan. SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data. LOW-LEVEL DESIGN: The code (currently in development) is at lp: ~maria-captains/maria/maria-5.1-table-elimination tree. <contents> 1. Elimination criteria 2. No outside references check 2.1 Quick check if there are tables with no outside references 3. One-match check 3.1 Functional dependency source #1: Potential eq_ref access 3.2 Functional dependency source #2: col2=func(col1) 3.3 Functional dependency source #3: One or zero records in the table 3.4 Functional dependency check implementation 3.4.1 Equality collection: Option1 3.4.2 Equality collection: Option2 3.4.3 Functional dependency propagation - option 1 3.4.4 Functional dependency propagation - option 2 4. Removal operation properties 5. Removal operation 6. User interface 6.1 @@optimizer_switch flag 6.2 EXPLAIN [EXTENDED] 7. Miscellaneous adjustments 7.1 Fix used_tables() of aggregate functions 7.2 Make subquery predicates collect their outer references 8. Other concerns 8.1 Relationship with outer->inner joins converter 8.2 Relationship with prepared statements 8.3 Relationship with constant table detection 9. Tests and benchmarks </contents> It's not really about elimination of tables, it's about elimination of inner sides of outer joins. 1. Elimination criteria ======================= We can eliminate inner side of an outer join nest if: 1. There are no references to columns of the inner tables anywhere else in the query. 2. For each record combination of outer tables, it will always produce exactly one matching record combination. Most of effort in this WL entry is checking these two conditions. 2. No outside references check ============================== Criterion #1 means that the WHERE clause, ON clauses of embedding/subsequent outer joins, ORDER BY, GROUP BY and HAVING must have no references to inner tables of the outer join nest we're trying to remove. For multi-table UPDATE/DELETE we also must not remove tables that we're updating/deleting from or tables that are used in UPDATE's SET clause. 2.1 Quick check if there are tables with no outside references ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before we start searching for outer join nests that could be eliminated, we'll do a quick and cheap check if there possibly could be something that could be eliminated: if (there are outer joins && (tables used in select_list | tables used in group/order by UNION | tables used in where) != bitmap_of_all_join_tables) { attempt table elimination; } 3. One-match check ================== We can eliminate inner side of outer join if it will always generate exactly one matching record combination. By definition of OUTER JOIN, a NULL-complemented record combination will be generated when the inner side of outer join has not produced any matches. What remains to be checked is that there is no possiblity that inner side of the outer join could produce more than one matching record combination. We'll refer to one-match property as "functional dependency": - A outer join nest is functionally dependent [wrt outer tables] if it will produce one matching record combination per each record combination of outer tables - A table is functionally dependent wrt certain set of dependency tables, if record combination of dependency tables uniquely identifies zero or one matching record in the table - Definitions of functional dependency of keys (=column tuples) and columns are apparent. Our goal is to prove that the entire join nest is functionally-dependent. Join nest is functionally dependent (on the otside tables) if each of its elements (those can be either base tables or join nests) is functionally dependent. Functional dependency is transitive: if table A is f-dependent on the outer tables and table B is f.dependent on {A, outer_tables} then B is functionally dependent on the outer tables. Subsequent sections list cases when we can declare a table to be functionally-dependent. 3.1 Functional dependency source #1: Potential eq_ref access ------------------------------------------------------------ This is the most practically-important case. Taking the example from the HLD of this WL entry: select A.colA from tableA A left outer join tableB B on B.id = A.id; and generalizing it: a table TBL is functionally-dependent if the ON expression allows to construct a potential eq_ref access to table TBL that uses only outer or functionally-dependent tables. In other words: table TBL will have one match if the ON expression can be converted into this form TBL.unique_key=func(one_match_tables) AND .. remainder ... (with appropriate extension for multi-part keys), where one_match_tables= { tables that are not on the inner side of the outer join in question, and functionally dependent tables } Note that this will cover constant tables, except those that are constant because they have 0/1 record or are partitioned and have no used partitions. 3.2 Functional dependency source #2: col2=func(col1) ---------------------------------------------------- This comes from the second example in the HLS: create unique index idx on tableB (id, fromDate); ... left outer join tableB B on B.id = A.id and B.fromDate = (select max(sub.fromDate) from tableB sub where sub.id = A.id); Here it is apparent that tableB can be eliminated. It is not possible to construct eq_ref access to tableB, though, because for the second part of the primary key (fromDate column) we only got a condition in this form: B.fromDate= func(tableB) (we write "func(tableB)" because ref optimizer can only determine which tables the right part of the equality depends on). In general case, equality like this doesn't guarantee functional dependency. For example, if func() == { return fromDate;}, i.e the ON expression is ... ON B.id = A.id and B.fromDate = B.fromDate then that would allow table B to have multiple matches per record of table A. In order to be able to distinguish between these two cases, we'll need to go down to column level: - A table is functionally dependent if it has a unique key that's functionally dependent - A unique key is functionally dependent when all of its columns are functionally dependent - A table column is functionally dependent if the ON clause allows to extract an AND-part in this form: tbl.column = f(functionally-dependent columns or columns of outer tables) 3.3 Functional dependency source #3: One or zero records in the table --------------------------------------------------------------------- A table with one or zero records cannot generate more than one matching record. This source is of lesser importance as one/zero-record tables are only MyISAM tables. 3.4 Functional dependency check implementation ---------------------------------------------- As shown above, we need something similar to KEYUSE structures, but not exactly that (we need things that current ref optimizer considers unusable and don't need things that it considers usable). 3.4.1 Equality collection: Option1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We could - extend KEYUSE structures to store all kinds of equalities we need - change update_ref_and_keys() and co. to collect equalities both for ref access and for table elimination = [possibly] Improve [eq_]ref access to be able to use equalities in form keypart2=func(keypart1) - process the KEYUSE array both by table elimination and by ref access optimizer. + This requires less effort. - Code will have to be changed all over sql_select.cc - update_ref_and_keys() and co. already do several unrelated things. Hooking up table elimination will make it even worse. 3.4.2 Equality collection: Option2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alternatively, we could process the WHERE clause totally on our own. + Table elimination is standalone and easy to detach module. - Some code duplication with update_ref_and_keys() and co. Having got the equalities, we'll to propagate functional dependency property to unique keys, tables and, ultimately, join nests. 3.4.3 Functional dependency propagation - option 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Borrow the approach used in constant table detection code: do { converted= FALSE; for each table T in join nest { if (check_if_functionally_dependent(T)) converted= TRUE; } } while (converted == TRUE); check_if_functionally_dependent(T) { if (T has eq_ref access based on func_dep_tables) return TRUE; Apply the same do-while loop-based approach to available equalities T.column1=func(other columns) to spread the set of functionally-dependent columns. The goal is to get all columns of a certain unique key to be bound. } 3.4.4 Functional dependency propagation - option 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Analyze the ON expression(s) and build a list of tbl.field = expr(...) equalities. tbl here is a table that belongs to a join nest that could potentially be eliminated. besides those, add to the list - An element for each unique key in the table that needs to be eliminated - An element for each table that needs to be eliminated - An element for each join nest that can be eliminated (i.e. has no references from outside). Then, setup "reverse dependencies": each element should have pointers to elements that are functionally dependent on it: - "tbl.field=expr(...)" equality is functionally dependent on all fields that are used in "expr(...)" (here we take into account only fields that belong to tables that can potentially be eliminated). - a unique key is dependent on all of its components - a table is dependent on all of its unique keys - a join nest is dependent on all tables that it contains These pointers are stored in form of one bitmap, such that: "X depends on Y" == test( bitmap[(X's number)*n_objects + (Y's number)] ) Each object also stores a number of dependencies it needs to be satisfied before it itself is satisfied: - "tbl.field=expr(...)" needs all its underlying fields (if a field is referenced many times it is counted only once) - a unique key needs all of its key parts - a table needs only one of its unique keys - a join nest needs all of its tables (TODO: so what do we do when we've marked a table as constant? We'll need to update the "field=expr(....)" elements that use fields of that table. And the problem is that we won't know how much to decrement from the counters of those elements. Solution#1: switch to table_map() based approach. Solution#2: introduce separate elements for each involved field. field will depend on its table, "field=expr" will depend on fields. ) Besides the above, let each element have a pointer to another element, so that we can have a linked list of elements. After the above structures have been created, we start the main algorithm. The first step is to create a list of functionally-dependent elements. We walk across array of dependencies and mark those elements that are already bound (i.e. their dependencies are satisfied). At the moment those immediately-bound are only "field=expr" dependencies that don't refer to any columns that are not bound. The second step is the loop while (bound_list is not empty) { Take the first bound element F off the list. Use the bitmap to find out what other elements depended on it for each such element E { if (E becomes bound after F is bound) add E to the list; } } The last step is to walk through elements that represent the join nests. Those that are bound can be eliminated. 4. Removal operation properties =============================== * There is always one way to remove (no choice to remove either this or that) * It is always better to remove as much tables as possible (at least within our cost model). Thus, no need for any cost calculations/etc. It's an unconditional rewrite. 5. Removal operation ==================== (This depends a lot on whether we make table elimination a one-off rewrite or conditional) At the moment table elimination is re-done for each join re-execution, hence the removal operation is designed not to modify any statement's permanent members. * Remove the outer join nest's nested join structure (i.e. get the outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding, $OJ->embedding->nested_join. Update table_map's of all ancestor nested joins). [MARK2] * Move the tables and their JOIN_TABs to the front of join order, like it is done with const tables, with exception that if eliminated outer join nest was within another outer join nest, that shouldn't prevent us from moving away the eliminated tables. * Update join->table_count and all-join-tables bitmap. ^ TODO: not true anymore ^ * That's it. Nothing else? 6. User interface ================= 6.1 @@optimizer_switch flag --------------------------- Argument againist adding the flag: * It is always better to perform table elimination than not to do it. Arguments for the flag: * It is always theoretically possible that the new code will cause unintended slowdowns. * Having the flag is useful for QA and comparative benchmarking. Decision so far: add the flag under #ifdef. Make the flag be present in debug builds. 6.2 EXPLAIN [EXTENDED] ---------------------- There are two possible options: 1. Show eliminated tables, like we do with const tables. 2. Do not show eliminated tables. We chose option 2, because: - the table is not accessed at all (besides locking it) - it is more natural for anchor model user - when he's querying an anchor- and attributes view, he doesn't care about the unused attributes. EXPLAIN EXTENDED+SHOW WARNINGS won't show the removed table either. NOTE: Before this WL, the warning text was generated after all JOIN objects have been destroyed. This didn't allow to use information about join plan when printing the warning. We've fixed this by keeping the JOIN objects until the warning text has been generated. Table elimination removes inner sides of outer join, and logically the ON clause is also removed. If this clause has any subqueries, they will be also removed from EXPLAIN output. An exception to the above is that if we eliminate a derived table, it will still be shown in EXPLAIN output. This comes from the fact that the FROM subqueries are evaluated before table elimination is invoked. TODO: Is the above ok or still remove parts of FROM subqueries? 7. Miscellaneous adjustments ============================ 7.1 Fix used_tables() of aggregate functions -------------------------------------------- Aggregate functions used to report that they depend on all tables, that is, item_agg_func->used_tables() == (1ULL << join->tables) - 1 always. Fixed it, now aggregate function reports that it depends on the tables that its arguments depend on. In particular, COUNT(*) reports that it depends on no tables (item_count_star->used_tables()==0). One consequence of that is that "item->used_tables()==0" is not equivalent to "item->const_item()==true" anymore (not sure if it's "anymore" or this has been already so for some items). 7.2 Make subquery predicates collect their outer references ----------------------------------------------------------- Per-column functional dependency analysis requires us to take a tbl.field = func(...) equality and tell which columns of which tables are referred from func(...) expression. For scalar expressions, this is accomplished by Item::walk()-based traversal. It should be reasonably cheap (the only practical Item that can be expensive to traverse seems to be a special case of "col IN (const1,const2, ...)". check if we traverse the long list for such items). For correlated subqueries, traversal can be expensive, it is cheaper to make each subquery item have a list of its outer references. The list can be collected at fix_fields() stage with very little extra cost, and then it could be used for other optimizations. 8. Other concerns ================= 8.1 Relationship with outer->inner joins converter -------------------------------------------------- One could suspect that outer->inner join conversion could get in the way of table elimination by changing outer joins (which could be eliminated) to inner (which we will not try to eliminate). This concern is not valid: we make outer->inner conversions based on predicates in WHERE. If the WHERE referred to an inner table (this is a requirement for the conversion) then table elimination would not be applicable anyway. 8.2 Relationship with prepared statements ----------------------------------------- On one hand, it's natural to desire to make table elimination a once-per-statement operation, like outer->inner join conversion. We'll have to limit the applicability by removing [MARK1] as that can change during lifetime of the statement. The other option is to do table elimination every time. This will require to rework operation [MARK2] to be undoable. 8.3 Relationship with constant table detection ---------------------------------------------- Table elimination is performed after constant table detection (but before the range analysis). Constant tables are technically different from eliminated ones (e.g. the former are shown in EXPLAIN and the latter aren't). Considering we've already done the join_read_const_table() call, is there any real difference between constant table and eliminated one? If there is, should we mark const tables also as eliminated? from user/EXPLAIN point of view: no. constant table is the one that we read one record from. eliminated table is the one that we don't acccess at all. TODO 9. Tests and benchmarks ======================= Create a benchmark in sql-bench which checks if the DBMS has table elimination. [According to Monty] Run - query Q1 that would use elimination - query Q2 that is very similar to Q1 (so that they would have same QEP, execution cost, etc) but cannot use table elimination. then compare run times and make a conclusion about whether the used dbms supports table elimination. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Rev 2728: - Better comments in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 16 Aug '09

16 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2728 revision-id: psergey(a)askmonty.org-20090816124331-gd53m2alc0jb3ws4 parent: psergey(a)askmonty.org-20090816121708-v42h3mehvoy4c7yu committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sun 2009-08-16 15:43:31 +0300 message: - Better comments - Add OOM error checking === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-16 12:17:08 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-16 12:43:31 +0000 @@ -184,18 +184,11 @@ unknown_args= n_children; } /* - Outer join we're representing. This can be a join nest or a one table that + Outer join we're representing. This can be a join nest or one table that is outer join'ed. */ TABLE_LIST *table_list; - /* - Tables within this outer join (and its descendants) that are not yet known - to be functionally dependent. - */ - table_map missing_tables; //psergey-todo: remove - /* All tables within this outer join and its descendants */ - table_map all_tables; //psergey-todo: remove /* Parent eliminable outer join, if any */ Outer_join_module *parent; }; @@ -229,11 +222,11 @@ }; static -void build_eq_deps_for_cond(Table_elimination *te, Equality_module **fdeps, +bool build_eq_deps_for_cond(Table_elimination *te, Equality_module **fdeps, uint *and_level, Item *cond, table_map usable_tables); static -void add_eq_dep(Table_elimination *te, Equality_module **eq_dep, +bool add_eq_dep(Table_elimination *te, Equality_module **eq_dep, uint and_level, Item_func *cond, Item *left, Item *right, table_map usable_tables); @@ -270,7 +263,7 @@ */ static -void build_eq_deps_for_cond(Table_elimination *te, Equality_module **fdeps, +bool build_eq_deps_for_cond(Table_elimination *te, Equality_module **fdeps, uint *and_level, Item *cond, table_map usable_tables) { @@ -285,7 +278,8 @@ Item *item; while ((item=li++)) { - build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables); + if (build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables)) + return TRUE; } /* TODO: inject here a "if we have {t.col=const AND t.col=smth_else}, then @@ -297,47 +291,52 @@ else { (*and_level)++; - build_eq_deps_for_cond(te, fdeps, and_level, li++, usable_tables); + if (build_eq_deps_for_cond(te, fdeps, and_level, li++, usable_tables)) + return TRUE; Item *item; while ((item=li++)) { Equality_module *start_key_fields= *fdeps; (*and_level)++; - build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables); + if (build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables)) + return TRUE; *fdeps= merge_func_deps(org_key_fields, start_key_fields, *fdeps, ++(*and_level)); } } - return; + return FALSE; } if (cond->type() != Item::FUNC_ITEM) - return; + return FALSE; Item_func *cond_func= (Item_func*) cond; Item **args= cond_func->arguments(); - Item *fld; switch (cond_func->functype()) { case Item_func::IN_FUNC: { if (cond_func->argument_count() == 2) { - add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], - usable_tables); - add_eq_dep(te, fdeps, *and_level, cond_func, args[1], args[0], - usable_tables); + if (add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], + usable_tables) || + add_eq_dep(te, fdeps, *and_level, cond_func, args[1], args[0], + usable_tables)) + return TRUE; } } case Item_func::BETWEEN: { + Item *fld; if (!((Item_func_between*)cond)->negated && + (fld= args[0]->real_item())->type() == Item::FIELD_ITEM && args[1]->eq(args[2], ((Item_field*)fld)->field->binary())) { - add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], - usable_tables); - add_eq_dep(te, fdeps, *and_level, cond_func, args[1], args[0], - usable_tables); + if (add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], + usable_tables) || + add_eq_dep(te, fdeps, *and_level, cond_func, args[1], args[0], + usable_tables)) + return TRUE; } break; } @@ -353,10 +352,9 @@ case Item_func::ISNULL_FUNC: { Item *tmp=new Item_null; - if (unlikely(!tmp)) // Should never be true - return; - add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], - usable_tables); + if (!tmp || add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], + usable_tables)) + return TRUE; break; } case Item_func::MULT_EQUAL_FUNC: @@ -374,8 +372,9 @@ */ while ((item= it++)) { - add_eq_dep(te, fdeps, *and_level, cond_func, item, const_item, - usable_tables); + if (add_eq_dep(te, fdeps, *and_level, cond_func, item, const_item, + usable_tables)) + return TRUE; } } else @@ -395,8 +394,9 @@ { if (!field->eq(item2->field)) { - add_eq_dep(te, fdeps, *and_level, cond_func, item, item2, - usable_tables); + if (add_eq_dep(te, fdeps, *and_level, cond_func, item, item2, + usable_tables)) + return TRUE; } } it.rewind(); @@ -407,6 +407,7 @@ default: break; } + return FALSE; } @@ -536,7 +537,7 @@ */ static -void add_eq_dep(Table_elimination *te, Equality_module **eq_dep, +bool add_eq_dep(Table_elimination *te, Equality_module **eq_dep, uint and_level, Item_func *cond, Item *left, Item *right, table_map usable_tables) { @@ -550,7 +551,7 @@ if (right->result_type() != STRING_RESULT) { if (field->cmp_type() != right->result_type()) - return; + return FALSE; } else { @@ -560,17 +561,19 @@ */ if (field->cmp_type() == STRING_RESULT && ((Field_str*)field)->charset() != cond->compare_collation()) - return; + return FALSE; } } /* Store possible eq field */ (*eq_dep)->type= Module_dep::MODULE_EXPRESSION; //psergey-todo; - (*eq_dep)->field= get_field_value(te, field); + if (!((*eq_dep)->field= get_field_value(te, field))) + return TRUE; (*eq_dep)->expression= right; (*eq_dep)->level= and_level; (*eq_dep)++; } + return FALSE; }

1 0

[Maria-developers] Rev 2727: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 16 Aug '09

16 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2727 revision-id: psergey(a)askmonty.org-20090816121708-v42h3mehvoy4c7yu parent: psergey(a)askmonty.org-20090816091549-da84w3nlmx8prmvm committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sun 2009-08-16 15:17:08 +0300 message: MWL#17: Table elimination - Address review feedback: change expression analyzer used to be a copy-paste of ref analyzer. ref analyzer, besides doing ref analysis, also collected info about keys which had sargable predicates. We didn't need that part here. === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-16 09:15:49 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-16 12:17:08 +0000 @@ -233,11 +233,10 @@ uint *and_level, Item *cond, table_map usable_tables); static -void add_eq_dep(Table_elimination *te, - Equality_module **eq_dep, uint and_level, - Item_func *cond, Field *field, - bool eq_func, Item **value, - uint num_values, table_map usable_tables); +void add_eq_dep(Table_elimination *te, Equality_module **eq_dep, + uint and_level, + Item_func *cond, Item *left, Item *right, + table_map usable_tables); static Equality_module *merge_func_deps(Equality_module *start, Equality_module *new_fields, Equality_module *end, uint and_level); @@ -314,87 +313,54 @@ if (cond->type() != Item::FUNC_ITEM) return; + Item_func *cond_func= (Item_func*) cond; - switch (cond_func->select_optimize()) { - case Item_func::OPTIMIZE_NONE: - break; - case Item_func::OPTIMIZE_KEY: - { - Item **values; - // BETWEEN, IN, NE - if (cond_func->key_item()->real_item()->type() == Item::FIELD_ITEM && - !(cond_func->used_tables() & OUTER_REF_TABLE_BIT)) - { - values= cond_func->arguments()+1; - if (cond_func->functype() == Item_func::NE_FUNC && - cond_func->arguments()[1]->real_item()->type() == Item::FIELD_ITEM && - !(cond_func->arguments()[0]->used_tables() & OUTER_REF_TABLE_BIT)) - values--; - DBUG_ASSERT(cond_func->functype() != Item_func::IN_FUNC || - cond_func->argument_count() != 2); - add_eq_dep(te, fdeps, *and_level, cond_func, - ((Item_field*)(cond_func->key_item()->real_item()))->field, - 0, values, - cond_func->argument_count()-1, - usable_tables); - } - if (cond_func->functype() == Item_func::BETWEEN) - { - values= cond_func->arguments(); - for (uint i= 1 ; i < cond_func->argument_count() ; i++) - { - Item_field *field_item; - if (cond_func->arguments()[i]->real_item()->type() == Item::FIELD_ITEM - && - !(cond_func->arguments()[i]->used_tables() & OUTER_REF_TABLE_BIT)) - { - field_item= (Item_field *) (cond_func->arguments()[i]->real_item()); - add_eq_dep(te, fdeps, *and_level, cond_func, - field_item->field, 0, values, 1, usable_tables); - } - } - } - break; - } - case Item_func::OPTIMIZE_OP: - { - bool equal_func=(cond_func->functype() == Item_func::EQ_FUNC || - cond_func->functype() == Item_func::EQUAL_FUNC); + Item **args= cond_func->arguments(); + Item *fld; - if (cond_func->arguments()[0]->real_item()->type() == Item::FIELD_ITEM && - !(cond_func->arguments()[0]->used_tables() & OUTER_REF_TABLE_BIT)) - { - add_eq_dep(te, fdeps, *and_level, cond_func, - ((Item_field*)(cond_func->arguments()[0])->real_item())->field, - equal_func, - cond_func->arguments()+1, 1, usable_tables); - } - if (cond_func->arguments()[1]->real_item()->type() == Item::FIELD_ITEM && - cond_func->functype() != Item_func::LIKE_FUNC && - !(cond_func->arguments()[1]->used_tables() & OUTER_REF_TABLE_BIT)) - { - add_eq_dep(te, fdeps, *and_level, cond_func, - ((Item_field*)(cond_func->arguments()[1])->real_item())->field, - equal_func, - cond_func->arguments(),1,usable_tables); - } - break; - } - case Item_func::OPTIMIZE_NULL: - /* column_name IS [NOT] NULL */ - if (cond_func->arguments()[0]->real_item()->type() == Item::FIELD_ITEM && - !(cond_func->used_tables() & OUTER_REF_TABLE_BIT)) - { - Item *tmp=new Item_null; - if (unlikely(!tmp)) // Should never be true - return; - add_eq_dep(te, fdeps, *and_level, cond_func, - ((Item_field*)(cond_func->arguments()[0])->real_item())->field, - cond_func->functype() == Item_func::ISNULL_FUNC, - &tmp, 1, usable_tables); - } - break; - case Item_func::OPTIMIZE_EQUAL: + switch (cond_func->functype()) { + case Item_func::IN_FUNC: + { + if (cond_func->argument_count() == 2) + { + add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], + usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, args[1], args[0], + usable_tables); + } + } + case Item_func::BETWEEN: + { + if (!((Item_func_between*)cond)->negated && + args[1]->eq(args[2], ((Item_field*)fld)->field->binary())) + { + add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], + usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, args[1], args[0], + usable_tables); + } + break; + } + case Item_func::EQ_FUNC: + case Item_func::EQUAL_FUNC: + { + add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], + usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, args[1], args[0], + usable_tables); + break; + } + case Item_func::ISNULL_FUNC: + { + Item *tmp=new Item_null; + if (unlikely(!tmp)) // Should never be true + return; + add_eq_dep(te, fdeps, *and_level, cond_func, args[0], args[1], + usable_tables); + break; + } + case Item_func::MULT_EQUAL_FUNC: + { Item_equal *item_equal= (Item_equal *) cond; Item *const_item= item_equal->get_const(); Item_equal_iterator it(*item_equal); @@ -408,8 +374,8 @@ */ while ((item= it++)) { - add_eq_dep(te, fdeps, *and_level, cond_func, item->field, - TRUE, &const_item, 1, usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, item, const_item, + usable_tables); } } else @@ -424,12 +390,13 @@ while ((item= fi++)) { Field *field= item->field; - while ((item= it++)) + Item_field *item2; + while ((item2= it++)) { - if (!field->eq(item->field)) + if (!field->eq(item2->field)) { - add_eq_dep(te, fdeps, *and_level, cond_func, field, - TRUE, (Item **) &item, 1, usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, item, item2, + usable_tables); } } it.rewind(); @@ -437,6 +404,9 @@ } break; } + default: + break; + } } @@ -567,75 +537,40 @@ static void add_eq_dep(Table_elimination *te, Equality_module **eq_dep, - uint and_level, Item_func *cond, Field *field, - bool eq_func, Item **value, uint num_values, - table_map usable_tables) + uint and_level, Item_func *cond, + Item *left, Item *right, table_map usable_tables) { - if (!(field->table->map & usable_tables)) - return; - - for (uint i=0; i<num_values; i++) - { - if ((value[i])->used_tables() & RAND_TABLE_BIT) - return; - } - - /* - Save the following cases: - Field op constant - Field LIKE constant where constant doesn't start with a wildcard - Field = field2 where field2 is in a different table - Field op formula - Field IS NULL - Field IS NOT NULL - Field BETWEEN ... - Field IN ... - */ - - /* - We can't always use indexes when comparing a string index to a - number. cmp_type() is checked to allow compare of dates to numbers. - eq_func is NEVER true when num_values > 1 - */ - if (!eq_func) - { - /* - Additional optimization: if we're processing "t.key BETWEEN c1 AND c1" - then proceed as if we were processing "t.key = c1". - */ - if ((cond->functype() != Item_func::BETWEEN) || - ((Item_func_between*) cond)->negated || - !value[0]->eq(value[1], field->binary())) - return; - eq_func= TRUE; - } - - if (field->result_type() == STRING_RESULT) - { - if ((*value)->result_type() != STRING_RESULT) - { - if (field->cmp_type() != (*value)->result_type()) - return; - } - else - { - /* - We can't use indexes if the effective collation - of the operation differ from the field collation. - */ - if (field->cmp_type() == STRING_RESULT && - ((Field_str*)field)->charset() != cond->compare_collation()) - return; - } - } - - DBUG_ASSERT(eq_func); - /* Store possible eq field */ - (*eq_dep)->type= Module_dep::MODULE_EXPRESSION; //psergey-todo; - (*eq_dep)->field= get_field_value(te, field); - (*eq_dep)->expression= *value; - (*eq_dep)->level= and_level; - (*eq_dep)++; + if ((left->used_tables() & usable_tables) && + !(right->used_tables() & RAND_TABLE_BIT) && + left->real_item()->type() == Item::FIELD_ITEM) + { + Field *field= ((Item_field*)left->real_item())->field; + if (field->result_type() == STRING_RESULT) + { + if (right->result_type() != STRING_RESULT) + { + if (field->cmp_type() != right->result_type()) + return; + } + else + { + /* + We can't use indexes if the effective collation + of the operation differ from the field collation. + */ + if (field->cmp_type() == STRING_RESULT && + ((Field_str*)field)->charset() != cond->compare_collation()) + return; + } + } + + /* Store possible eq field */ + (*eq_dep)->type= Module_dep::MODULE_EXPRESSION; //psergey-todo; + (*eq_dep)->field= get_field_value(te, field); + (*eq_dep)->expression= right; + (*eq_dep)->level= and_level; + (*eq_dep)++; + } } @@ -1150,6 +1085,12 @@ { Outer_join_module *outer_join_dep= (Outer_join_module*)bound_modules; mark_as_eliminated(te->join, outer_join_dep->table_list); + if (!--te->n_outer_joins) + { + DBUG_PRINT("info", ("Table elimination eliminated everything" + " it theoretically could")); + return; + } break; } case Module_dep::MODULE_MULTI_EQUALITY:

1 0

[Maria-developers] Rev 2726: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 16 Aug '09

16 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2726 revision-id: psergey(a)askmonty.org-20090816091549-da84w3nlmx8prmvm parent: psergey(a)askmonty.org-20090816072524-w9fu2hy23pjwlr8z committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sun 2009-08-16 12:15:49 +0300 message: MWL#17: Table elimination - code cleanup === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-16 07:25:24 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-16 09:15:49 +0000 @@ -125,7 +125,7 @@ make elements that depend on them bound, too. */ Module_dep *next; - uint unknown_args; /* TRUE<=> The entity is considered bound */ + uint unknown_args; Module_dep() : next(NULL), unknown_args(0) {} }; @@ -249,11 +249,9 @@ void eliminate_tables(JOIN *join); static void mark_as_eliminated(JOIN *join, TABLE_LIST *tbl); -#if 0 #ifndef DBUG_OFF static void dbug_print_deps(Table_elimination *te); #endif -#endif /*******************************************************************************************/ /* @@ -854,7 +852,7 @@ /* - This is used to analyse expressions in "tbl.col=expr" dependencies so + This is used to analyze expressions in "tbl.col=expr" dependencies so that we can figure out which fields the expression depends on. */ @@ -965,7 +963,7 @@ } *bound_deps_list= bound_dep; - //DBUG_EXECUTE("test", dbug_print_deps(te); ); + DBUG_EXECUTE("test", dbug_print_deps(te); ); DBUG_RETURN(FALSE); } @@ -1089,7 +1087,6 @@ void signal_from_field_to_exprs(Table_elimination* te, Field_value *field_dep, Module_dep **bound_modules) { - /* Now, expressions */ for (uint i=0; i < te->n_equality_deps; i++) { if (bitmap_is_set(&te->expr_deps, field_dep->bitmap_offset + i) && @@ -1213,7 +1210,6 @@ for (Outer_join_module *outer_join_dep= table_dep->outer_join_dep; outer_join_dep; outer_join_dep= outer_join_dep->parent) { - //if (!(outer_join_dep->missing_tables &= ~table_dep->table->map)) if (outer_join_dep->unknown_args && !--outer_join_dep->unknown_args) { @@ -1268,7 +1264,6 @@ } -#if 0 #ifndef DBUG_OFF static void dbug_print_deps(Table_elimination *te) @@ -1324,7 +1319,6 @@ } #endif -#endif /** @} (end of group Table_Elimination) */

1 0

[Maria-developers] Updated (by Psergey): Make --replicate-(do, ignore)-(db, table) behaviour for RBR identical to that of SBR (49)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make --replicate-(do,ignore)-(db,table) behaviour for RBR identical to that of SBR CREATION DATE..: Sun, 16 Aug 2009, 12:01 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 49 (http://askmonty.org/worklog/?tid=49) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 12:06)=-=- High-Level Specification modified. --- /tmp/wklog.49.old.14928 2009-08-16 12:06:27.000000000 +0300 +++ /tmp/wklog.49.new.14928 2009-08-16 12:06:27.000000000 +0300 @@ -1 +1,12 @@ +Some notes: + +The only required changes are on the slave. The slave can see bounds between +statements (they are delimited by Query_event and Table_map_event entries), +Table_map_event lists all tables that are going to be updated => it is possible +to make a decision whether we should skip the statement, and if yes, skip all +RBR events that belong to the statement. + +Possible syntax for options +--replicate-wild-ignore-stmt-with-table=%.tmptbl% +--replicate-ignore-stmt-with-table=tmptbl -=-=(Psergey - Sun, 16 Aug 2009, 12:02)=-=- High Level Description modified. --- /tmp/wklog.49.old.14739 2009-08-16 12:02:05.000000000 +0300 +++ /tmp/wklog.49.new.14739 2009-08-16 12:02:05.000000000 +0300 @@ -6,8 +6,8 @@ This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. -This WL entry is about making processing of RBR events to work the same as SBR -events did. +This WL entry is about adding an option to make processing of RBR events to work +the same as SBR events did. DESCRIPTION: At the moment semantics of --replicate-(do,ignore)-(db,table) rules is different for RBR and SBR: http://dev.mysql.com/doc/refman/5.1/en/replication-rules-table-options.html This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. This WL entry is about adding an option to make processing of RBR events to work the same as SBR events did. HIGH-LEVEL SPECIFICATION: Some notes: The only required changes are on the slave. The slave can see bounds between statements (they are delimited by Query_event and Table_map_event entries), Table_map_event lists all tables that are going to be updated => it is possible to make a decision whether we should skip the statement, and if yes, skip all RBR events that belong to the statement. Possible syntax for options --replicate-wild-ignore-stmt-with-table=%.tmptbl% --replicate-ignore-stmt-with-table=tmptbl ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Make --replicate-(do, ignore)-(db, table) behaviour for RBR identical to that of SBR (49)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make --replicate-(do,ignore)-(db,table) behaviour for RBR identical to that of SBR CREATION DATE..: Sun, 16 Aug 2009, 12:01 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 49 (http://askmonty.org/worklog/?tid=49) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 12:06)=-=- High-Level Specification modified. --- /tmp/wklog.49.old.14928 2009-08-16 12:06:27.000000000 +0300 +++ /tmp/wklog.49.new.14928 2009-08-16 12:06:27.000000000 +0300 @@ -1 +1,12 @@ +Some notes: + +The only required changes are on the slave. The slave can see bounds between +statements (they are delimited by Query_event and Table_map_event entries), +Table_map_event lists all tables that are going to be updated => it is possible +to make a decision whether we should skip the statement, and if yes, skip all +RBR events that belong to the statement. + +Possible syntax for options +--replicate-wild-ignore-stmt-with-table=%.tmptbl% +--replicate-ignore-stmt-with-table=tmptbl -=-=(Psergey - Sun, 16 Aug 2009, 12:02)=-=- High Level Description modified. --- /tmp/wklog.49.old.14739 2009-08-16 12:02:05.000000000 +0300 +++ /tmp/wklog.49.new.14739 2009-08-16 12:02:05.000000000 +0300 @@ -6,8 +6,8 @@ This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. -This WL entry is about making processing of RBR events to work the same as SBR -events did. +This WL entry is about adding an option to make processing of RBR events to work +the same as SBR events did. DESCRIPTION: At the moment semantics of --replicate-(do,ignore)-(db,table) rules is different for RBR and SBR: http://dev.mysql.com/doc/refman/5.1/en/replication-rules-table-options.html This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. This WL entry is about adding an option to make processing of RBR events to work the same as SBR events did. HIGH-LEVEL SPECIFICATION: Some notes: The only required changes are on the slave. The slave can see bounds between statements (they are delimited by Query_event and Table_map_event entries), Table_map_event lists all tables that are going to be updated => it is possible to make a decision whether we should skip the statement, and if yes, skip all RBR events that belong to the statement. Possible syntax for options --replicate-wild-ignore-stmt-with-table=%.tmptbl% --replicate-ignore-stmt-with-table=tmptbl ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Make --replicate-(do, ignore)-(db, table) behaviour for RBR identical to that of SBR (49)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make --replicate-(do,ignore)-(db,table) behaviour for RBR identical to that of SBR CREATION DATE..: Sun, 16 Aug 2009, 12:01 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 49 (http://askmonty.org/worklog/?tid=49) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 12:02)=-=- High Level Description modified. --- /tmp/wklog.49.old.14739 2009-08-16 12:02:05.000000000 +0300 +++ /tmp/wklog.49.new.14739 2009-08-16 12:02:05.000000000 +0300 @@ -6,8 +6,8 @@ This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. -This WL entry is about making processing of RBR events to work the same as SBR -events did. +This WL entry is about adding an option to make processing of RBR events to work +the same as SBR events did. DESCRIPTION: At the moment semantics of --replicate-(do,ignore)-(db,table) rules is different for RBR and SBR: http://dev.mysql.com/doc/refman/5.1/en/replication-rules-table-options.html This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. This WL entry is about adding an option to make processing of RBR events to work the same as SBR events did. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Make --replicate-(do, ignore)-(db, table) behaviour for RBR identical to that of SBR (49)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make --replicate-(do,ignore)-(db,table) behaviour for RBR identical to that of SBR CREATION DATE..: Sun, 16 Aug 2009, 12:01 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 49 (http://askmonty.org/worklog/?tid=49) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 12:02)=-=- High Level Description modified. --- /tmp/wklog.49.old.14739 2009-08-16 12:02:05.000000000 +0300 +++ /tmp/wklog.49.new.14739 2009-08-16 12:02:05.000000000 +0300 @@ -6,8 +6,8 @@ This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. -This WL entry is about making processing of RBR events to work the same as SBR -events did. +This WL entry is about adding an option to make processing of RBR events to work +the same as SBR events did. DESCRIPTION: At the moment semantics of --replicate-(do,ignore)-(db,table) rules is different for RBR and SBR: http://dev.mysql.com/doc/refman/5.1/en/replication-rules-table-options.html This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. This WL entry is about adding an option to make processing of RBR events to work the same as SBR events did. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Make --replicate-(do, ignore)-(db, table) behaviour for RBR identical to that of SBR (49)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make --replicate-(do,ignore)-(db,table) behaviour for RBR identical to that of SBR CREATION DATE..: Sun, 16 Aug 2009, 12:01 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 49 (http://askmonty.org/worklog/?tid=49) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: At the moment semantics of --replicate-(do,ignore)-(db,table) rules is different for RBR and SBR: http://dev.mysql.com/doc/refman/5.1/en/replication-rules-table-options.html This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. This WL entry is about making processing of RBR events to work the same as SBR events did. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Make --replicate-(do, ignore)-(db, table) behaviour for RBR identical to that of SBR (49)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Make --replicate-(do,ignore)-(db,table) behaviour for RBR identical to that of SBR CREATION DATE..: Sun, 16 Aug 2009, 12:01 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 49 (http://askmonty.org/worklog/?tid=49) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: At the moment semantics of --replicate-(do,ignore)-(db,table) rules is different for RBR and SBR: http://dev.mysql.com/doc/refman/5.1/en/replication-rules-table-options.html This can be inconvenient, and also the semantics gets really complicated when --binlog_format=mixed is used. This WL entry is about making processing of RBR events to work the same as SBR events did. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 11:30)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.13453 2009-08-16 11:30:06.000000000 +0300 +++ /tmp/wklog.46.new.13453 2009-08-16 11:30:06.000000000 +0300 @@ -26,3 +26,7 @@ (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). +* When SBR replication is used and the statements refer to the current database + (a common scenario), one can use awk to filter out updates made in certain + databases. The proposed syntax doesn't allow to perform equivalent filtering? + -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12747 2009-08-16 11:13:54.000000000 +0300 +++ /tmp/wklog.46.new.12747 2009-08-16 11:13:54.000000000 +0300 @@ -6,4 +6,4 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default -The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12717 2009-08-16 11:13:40.000000000 +0300 +++ /tmp/wklog.46.new.12717 2009-08-16 11:13:40.000000000 +0300 @@ -5,3 +5,5 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default + +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency deleted: 48 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency created: 48 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * What is the need for WITH TIMESTAMP part? Can't one use a separate SET TIMESTAMP statement? * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). * When SBR replication is used and the statements refer to the current database (a common scenario), one can use awk to filter out updates made in certain databases. The proposed syntax doesn't allow to perform equivalent filtering? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 11:30)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.13453 2009-08-16 11:30:06.000000000 +0300 +++ /tmp/wklog.46.new.13453 2009-08-16 11:30:06.000000000 +0300 @@ -26,3 +26,7 @@ (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). +* When SBR replication is used and the statements refer to the current database + (a common scenario), one can use awk to filter out updates made in certain + databases. The proposed syntax doesn't allow to perform equivalent filtering? + -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12747 2009-08-16 11:13:54.000000000 +0300 +++ /tmp/wklog.46.new.12747 2009-08-16 11:13:54.000000000 +0300 @@ -6,4 +6,4 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default -The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12717 2009-08-16 11:13:40.000000000 +0300 +++ /tmp/wklog.46.new.12717 2009-08-16 11:13:40.000000000 +0300 @@ -5,3 +5,5 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default + +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency deleted: 48 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency created: 48 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * What is the need for WITH TIMESTAMP part? Can't one use a separate SET TIMESTAMP statement? * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). * When SBR replication is used and the statements refer to the current database (a common scenario), one can use awk to filter out updates made in certain databases. The proposed syntax doesn't allow to perform equivalent filtering? ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12747 2009-08-16 11:13:54.000000000 +0300 +++ /tmp/wklog.46.new.12747 2009-08-16 11:13:54.000000000 +0300 @@ -6,4 +6,4 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default -The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12717 2009-08-16 11:13:40.000000000 +0300 +++ /tmp/wklog.46.new.12717 2009-08-16 11:13:40.000000000 +0300 @@ -5,3 +5,5 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default + +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency deleted: 48 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency created: 48 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * What is the need for WITH TIMESTAMP part? Can't one use a separate SET TIMESTAMP statement? * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12747 2009-08-16 11:13:54.000000000 +0300 +++ /tmp/wklog.46.new.12747 2009-08-16 11:13:54.000000000 +0300 @@ -6,4 +6,4 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default -The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12717 2009-08-16 11:13:40.000000000 +0300 +++ /tmp/wklog.46.new.12717 2009-08-16 11:13:40.000000000 +0300 @@ -5,3 +5,5 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default + +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency deleted: 48 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency created: 48 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default The approach of this WL is to some extent an alternative to WL#38, WL#40, WL#41. HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * What is the need for WITH TIMESTAMP part? Can't one use a separate SET TIMESTAMP statement? * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12717 2009-08-16 11:13:40.000000000 +0300 +++ /tmp/wklog.46.new.12717 2009-08-16 11:13:40.000000000 +0300 @@ -5,3 +5,5 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default + +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency deleted: 48 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency created: 48 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * What is the need for WITH TIMESTAMP part? Can't one use a separate SET TIMESTAMP statement? * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 11:13)=-=- High Level Description modified. --- /tmp/wklog.46.old.12717 2009-08-16 11:13:40.000000000 +0300 +++ /tmp/wklog.46.new.12717 2009-08-16 11:13:40.000000000 +0300 @@ -5,3 +5,5 @@ This WL task is about making BINLOG statements to be human-readable (either as an option or by default + +The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 11:07)=-=- Dependency deleted: 48 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency created: 48 now depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 10:59)=-=- Dependency deleted: 39 no longer depends on 46 -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 46 -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default The approach of this WL is to some extent an alternative to WL#38, WL#40, WL41. HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * What is the need for WITH TIMESTAMP part? Can't one use a separate SET TIMESTAMP statement? * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Store in binlog text of statements that caused RBR events CREATION DATE..: Sat, 15 Aug 2009, 23:48 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 47 (http://askmonty.org/worklog/?tid=47) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 11:08)=-=- High-Level Specification modified. --- /tmp/wklog.47.old.12485 2009-08-16 11:08:33.000000000 +0300 +++ /tmp/wklog.47.new.12485 2009-08-16 11:08:33.000000000 +0300 @@ -1 +1,6 @@ +First suggestion: + +> I think for this we would actually need a new binlog event type +> (Comment_log_event?). Unless we want to log an empty statement Query_log_event +> containing only a comment (a bit of a hack). -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 47 DESCRIPTION: Store in binlog (and show in mysqlbinlog output) texts of statements that caused RBR events This is needed for (list from Monty): - Easier to understand why updates happened - Would make it easier to find out where in application things went wrong (as you can search for exact strings) - Allow one to filter things based on comments in the statement. The cost of this can be that the binlog will be approximately 2x in size (especially insert of big blob's would be a bit painful), so this should be an optional feature. HIGH-LEVEL SPECIFICATION: First suggestion: > I think for this we would actually need a new binlog event type > (Comment_log_event?). Unless we want to log an empty statement Query_log_event > containing only a comment (a bit of a hack). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Store in binlog text of statements that caused RBR events CREATION DATE..: Sat, 15 Aug 2009, 23:48 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 47 (http://askmonty.org/worklog/?tid=47) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sun, 16 Aug 2009, 11:08)=-=- High-Level Specification modified. --- /tmp/wklog.47.old.12485 2009-08-16 11:08:33.000000000 +0300 +++ /tmp/wklog.47.new.12485 2009-08-16 11:08:33.000000000 +0300 @@ -1 +1,6 @@ +First suggestion: + +> I think for this we would actually need a new binlog event type +> (Comment_log_event?). Unless we want to log an empty statement Query_log_event +> containing only a comment (a bit of a hack). -=-=(Psergey - Sun, 16 Aug 2009, 00:02)=-=- Dependency created: 39 now depends on 47 DESCRIPTION: Store in binlog (and show in mysqlbinlog output) texts of statements that caused RBR events This is needed for (list from Monty): - Easier to understand why updates happened - Would make it easier to find out where in application things went wrong (as you can search for exact strings) - Allow one to filter things based on comments in the statement. The cost of this can be that the binlog will be approximately 2x in size (especially insert of big blob's would be a bit painful), so this should be an optional feature. HIGH-LEVEL SPECIFICATION: First suggestion: > I think for this we would actually need a new binlog event type > (Comment_log_event?). Unless we want to log an empty statement Query_log_event > containing only a comment (a bit of a hack). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Extra replication tasks (48)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Extra replication tasks CREATION DATE..: Sun, 16 Aug 2009, 10:58 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 48 (http://askmonty.org/worklog/?tid=48) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: An umbrella task for replication tasks that are nice to do but are not direct responses for customer requests ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Extra replication tasks (48)
by worklog-noreply＠askmonty.org 16 Aug '09

16 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Extra replication tasks CREATION DATE..: Sun, 16 Aug 2009, 10:58 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 48 (http://askmonty.org/worklog/?tid=48) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: An umbrella task for replication tasks that are nice to do but are not direct responses for customer requests ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Rev 2725: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10-vg/
by Sergey Petrunya 16 Aug '09

16 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10-vg/ ------------------------------------------------------------ revno: 2725 revision-id: psergey(a)askmonty.org-20090816072524-w9fu2hy23pjwlr8z parent: psergey(a)askmonty.org-20090815153912-q47vfp1j22ilmup2 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10-vg timestamp: Sun 2009-08-16 10:25:24 +0300 message: MWL#17: Table elimination - Fix trivial valgrind failures that shown up after review === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-15 15:39:12 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-16 07:25:24 +0000 @@ -40,13 +40,16 @@ Table elimination is redone on every PS re-execution. */ -class Value_dep +class Value_dep : public Sql_alloc { public: enum { VALUE_FIELD, VALUE_TABLE, } type; /* Type of the object */ + + Value_dep(): bound(FALSE), next(NULL) + {} bool bound; Value_dep *next;

1 0

[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: fair choice between index_merge union and range access CREATION DATE..: Tue, 26 May 2009, 12:10 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: Psergey CATEGORY.......: Server-Sprint TASK ID........: 24 (http://askmonty.org/worklog/?tid=24) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Sun, 16 Aug 2009, 02:13)=-=- Low Level Design modified. --- /tmp/wklog.24.old.23383 2009-08-16 02:13:54.000000000 +0300 +++ /tmp/wklog.24.new.23383 2009-08-16 02:13:54.000000000 +0300 @@ -125,7 +125,7 @@ The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. -(Here an imerge for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor +(Here no imerge for col2=c2 OR col3=c3 will be built since neither col2=c2 nor col3=c3 represent index ranges.) @@ -199,7 +199,7 @@ O2. "Create index_merge accesses when possible" Current tree_or() will not create index_merge access when it could create - non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems + non-index merge access (see DISCARD-IMERGE-2 and its example in the "Problems in the current implementation" section). This will be changed to work as follows: we will create index_merge made for index scans that didn't have their match in the other sel_tree. -=-=(Guest - Sun, 16 Aug 2009, 01:03)=-=- Low Level Design modified. --- /tmp/wklog.24.old.20767 2009-08-16 01:03:11.000000000 +0300 +++ /tmp/wklog.24.new.20767 2009-08-16 01:03:11.000000000 +0300 @@ -18,6 +18,8 @@ # a range tree has range access options, possibly for several keys range_tree = range(key1) AND range(key2) AND ... AND range(keyN); + (here range(keyi) may represent ranges not for initial keyi prefixes, + but ranges for any infixes for keyi) # merge tree represents several way to index_merge imerge_tree = imerge1 AND imerge2 AND ... @@ -47,13 +49,13 @@ R.add(range_union(A.range(i), B.range(i))); if (R has at least one range access) - return R; + return R; // DISCARD-IMERGE-2 else { /* could not build any range accesses. construct index_merge */ - remove non-ranges from A; // DISCARD-IMERGE-2 + remove non-ranges from A; remove non-ranges from B; - return new index_merge(A, B); + return new index_merge(A, B); // DISCARD-IMERGE-3 } } else if (A is range tree and B is index_merge tree (or vice versa)) @@ -65,12 +67,12 @@ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND ... - (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND + (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) = (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND ... - (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) Now each line represents an index_merge.. } @@ -82,18 +84,18 @@ OR imergeB1 AND imergeB2 AND ... AND imergeBN - -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3 + -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-4 imergeA1 OR - imergeB1 AND imergeB2 AND ... AND imergeBN = + imergeB1 = - = (combine imergeA1 with each of the imergeB{i} ) = + = (combine imergeA1 with each of the range_treeB_1{i} ) = - combine(imergeA1 OR imergeB1) AND - combine(imergeA1 OR imergeB2) AND + combine(imergeA1 OR range_treeB_11) AND + combine(imergeA1 OR range_treeB_12) AND ... AND - combine(imergeA1 OR imergeBN) + combine(imergeA1 OR range_treeB_1N) } } @@ -109,7 +111,7 @@ DISCARD-IMERGE-2 step will cause index_merge option to be discarded when the WHERE clause has this form (conditions t.badkey may have abritrary form): - (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2) + (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2) DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are two indexes: @@ -123,6 +125,8 @@ The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. +(Here an imerge for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor +col3=c3 represent index ranges.) 2. New implementation -=-=(Guest - Mon, 20 Jul 2009, 17:13)=-=- Dependency deleted: 30 no longer depends on 24 -=-=(Guest - Sat, 20 Jun 2009, 09:34)=-=- Low Level Design modified. --- /tmp/wklog.24.old.21663 2009-06-20 09:34:48.000000000 +0300 +++ /tmp/wklog.24.new.21663 2009-06-20 09:34:48.000000000 +0300 @@ -4,6 +4,7 @@ 2. New implementation 2.1 New tree_and() 2.2 New tree_or() +3. Testing and required coverage </contents> 1. Current implementation overview @@ -240,3 +241,14 @@ In order to limit the impact of this combinatorial explosion, we will introduce a rule that we won't generate more than #defined MAX_IMERGE_OPTS options. + +3. Testing and required coverage +================================ +So far could find the following user cases: + +* BUG#17259: Query optimizer chooses wrong index +* BUG#17673: Optimizer does not use Index Merge optimization in some cases +* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge +* BUG#30151: optimizer is very reluctant to chose index_merge algorithm + + -=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=- Low Level Design modified. --- /tmp/wklog.24.old.19152 2009-06-18 16:55:00.000000000 +0300 +++ /tmp/wklog.24.new.19152 2009-06-18 16:55:00.000000000 +0300 @@ -141,13 +141,15 @@ Operations on SEL_ARG trees will be modified to produce/process the trees of this kind: + 2.1 New tree_and() ------------------ In order not to lose plans, we'll make these changes: -1. Don't remove index_merge part of the tree. +A1. Don't remove index_merge part of the tree (this will take care of + DISCARD-IMERGE-1 problem) -2. Push range conditions down into index_merge trees that may support them. +A2. Push range conditions down into index_merge trees that may support them. if one tree has range(key1) and the other tree has imerge(key1 OR key2) then perform an equvalent of this operation: @@ -155,8 +157,86 @@ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) -3. Just as before: if both sel_tree A and sel_tree B have index_merge options, +A3. Just as before: if both sel_tree A and sel_tree B have index_merge options, concatenate them together. -2.2 New tree_or() +2.2 New tree_or() +----------------- +O1. Dont remove non-range plans: + Current tree_or() code will refuse to produce index_merge plans for + conditions like + + "t.key1part2=const OR t.key2part1=const" + + (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of + the AND condition is not usable for range access, and the operation of + tree_and() guaranteed that there was no way it could changed to make a + usable range plan. With new tree_and() and rule A2, this is no longer the + case. For example for this query: + + (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const + + it will construct a + + imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const) + + then tree_and() will apply rule A2 to push the range down into index merge + and after that we'll have: + + range(t.key1part1=const) + imerge( + t.key1part2=const AND t.key1part1=const, + t.key2part1=const + ) + note that imerge(...) describes a usable index_merge plan and it's possible + that it will be the best access path. + +O2. "Create index_merge accesses when possible" + Current tree_or() will not create index_merge access when it could create + non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems + in the current implementation" section). This will be changed to work as + follows: we will create index_merge made for index scans that didn't have + their match in the other sel_tree. + Ilustrating it with an example: + + | sel_tree_A | sel_tree_B | A or B | include in index_merge? + ------+------------+------------+--------+------------------------ + key1 | cond1 | cond2 | condM | no + key2 | cond3 | cond4 | NULL | no + key3 | cond5 | | | yes, A-side + key4 | cond6 | | | yes, A-side + key5 | | cond7 | | yes, B-side + key6 | | cond8 | | yes, B-side + + here we assume that + - (cond1 OR cond2) did produce a combined range. Not including them in + index_merge. + - (cond3 OR cond4) didn't produce a usable range (e.g. they were + t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them + didn't yield any range list) + - All other scand didn't have their counterparts, so we'll end up with a + SEL_TREE of: + + range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8)) + . + +O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is +that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven +seen any complaints that could be attributed to it. +If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to +lift it ,and produce a cross-product: + + ((key1p OR key2p) AND (key3p OR key4p)) + OR + ((key5p OR key6p) AND (key7p OR key8p)) + + = (key1p OR key2p OR key5p OR key6p) AND // this part is currently + (key3p OR key4p OR key5p OR key6p) AND // produced + + (key1p OR key2p OR key5p OR key6p) AND // this part will be added + (key3p OR key4p OR key5p OR key6p) //. + +In order to limit the impact of this combinatorial explosion, we will +introduce a rule that we won't generate more than #defined +MAX_IMERGE_OPTS options. -=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=- Low Level Design modified. --- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300 +++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300 @@ -1 +1,162 @@ +<contents> +1. Current implementation overview +1.1. Problems in the current implementation +2. New implementation +2.1 New tree_and() +2.2 New tree_or() +</contents> + +1. Current implementation overview +================================== +At the moment, range analyzer works as follows: + +SEL_TREE structure represents + + # There are sel_trees, a sel_tree is either range or merge tree + sel_tree = range_tree | imerge_tree + + # a range tree has range access options, possibly for several keys + range_tree = range(key1) AND range(key2) AND ... AND range(keyN); + + # merge tree represents several way to index_merge + imerge_tree = imerge1 AND imerge2 AND ... + + # a way to do index merge == a set to use of different indexes. + imergeX = range_tree1 OR range_tree2 OR .. + where no pair of range_treeX have ranges over the same index. + + + tree_and(A, B) + { + if (both A and B are range trees) + return a range_tree with computed intersection for each range; + if (only one of A and B is a range tree) + return that tree; // DISCARD-IMERGE-1 + // at this point both trees are index_merge trees + return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN); + } + + + tree_or(A, B) + { + if (A and B are range trees) + { + R = new range_tree; + for each index i + R.add(range_union(A.range(i), B.range(i))); + + if (R has at least one range access) + return R; + else + { + /* could not build any range accesses. construct index_merge */ + remove non-ranges from A; // DISCARD-IMERGE-2 + remove non-ranges from B; + return new index_merge(A, B); + } + } + else if (A is range tree and B is index_merge tree (or vice versa)) + { + Perform this transformation: + + range_treeA // this is A + OR + (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND + (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND + ... + (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND + = + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND + ... + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + + Now each line represents an index_merge.. + } + else if (both A and B are index_merge trees) + { + Perform this transformation: + + imergeA1 AND imergeA2 AND ... AND imergeAN + OR + imergeB1 AND imergeB2 AND ... AND imergeBN + + -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3 + + imergeA1 + OR + imergeB1 AND imergeB2 AND ... AND imergeBN = + + = (combine imergeA1 with each of the imergeB{i} ) = + + combine(imergeA1 OR imergeB1) AND + combine(imergeA1 OR imergeB2) AND + ... AND + combine(imergeA1 OR imergeBN) + } + } + +1.1. Problems in the current implementation +------------------------------------------- +As marked in the code above: + +DISCARD-IMERGE-1 step will cause index_merge option to be discarded when +the WHERE clause has this form: + + (t.key1=c1 OR t.key2=c2) AND t.badkey < c3 + +DISCARD-IMERGE-2 step will cause index_merge option to be discarded when +the WHERE clause has this form (conditions t.badkey may have abritrary form): + + (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2) + +DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are +two indexes: + + INDEX i1(col1, col2), + INDEX i2(col1, col3) + +and this WHERE clause: + + col1=c1 AND (col2=c2 OR col3=c3) + +The optimizer will generate the plans that only use the "col1=c1" part. The +right side of the AND will be ignored even if it has good selectivity. + + +2. New implementation +===================== + +<general idea> +* Don't start fighting combinatorial explosion until we've actually got one. +</> + +SEL_TREE structure will be now able to hold both index_merge and range scan +candidates at the same time. That is, + + sel_tree2 = range_tree AND imerge_tree + +where both parts are optional (i.e. can be empty) + +Operations on SEL_ARG trees will be modified to produce/process the trees of +this kind: + +2.1 New tree_and() +------------------ +In order not to lose plans, we'll make these changes: + +1. Don't remove index_merge part of the tree. + +2. Push range conditions down into index_merge trees that may support them. + if one tree has range(key1) and the other tree has imerge(key1 OR key2) + then perform an equvalent of this operation: + + rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) = + + (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) + +3. Just as before: if both sel_tree A and sel_tree B have index_merge options, + concatenate them together. + +2.2 New tree_or() -=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=- Dependency created: 30 now depends on 24 -=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300 +++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300 @@ -64,6 +64,9 @@ * How strict is the limitation on the form of the WHERE? +* Which version should this be based on? 5.1? Which patches are should be in + (google's/percona's/maria/etc?) + * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? -=-=(Guest - Wed, 27 May 2009, 13:59)=-=- Title modified. --- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300 +++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300 @@ -1 +1 @@ -index_merge optimizer: dont discard index_merge union strategies when range is available +index_merge: fair choice between index_merge union and range access -=-=(Guest - Tue, 26 May 2009, 13:27)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.305 2009-05-26 13:27:32.000000000 +0300 +++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.000000000 +0300 @@ -1 +1,70 @@ +(Not a ready HLS but draft) +<contents> +Solution overview +Limitations +TODO + +</contents> + +Solution overview +================= +The idea is to delay discarding potential index_merge plans until the point +where it is really necessary. + +This way, we won't have to do much changes in the range analyzer, but will be +able to keep potential index_merge plan just enough so that it's possible to +take it into consideration together with range access plans. + +Since there are no changes in the optimizer, the ability to consider both +range and index_merge options will be limited to WHERE clauses of this form: + + WHERE := range_cond(key1_1) AND + range_cond(key2_1) AND + other_cond AND + index_merge_OR_cond1(key3_1, key3_2, ...) + index_merge_OR_cond2(key4_1, key4_2, ...) + +where + + index_merge_OR_cond{N} := (range_cond(keyN_1) OR + range_cond(keyN_2) OR ...) + + + range_cond(keyX) := condition that allows to construct range access of keyX + and doesn't allow to construct range/index_merge accesses + for any keys of the table in question. + + +For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: + + SEL_TREE( + range(key1_1), + ... + range(key2_1), + SEL_IMERGE( (1) + SEL_TREE(key3_1}) + SEL_TREE(key3_2}) + ... + ) + ... + ) + +which can be used to make a cost-based choice between range and index_merge. + +Limitations +----------- +This will not be a full solution in a sense that the range analyzer will not +be able to produce sel_tree (1) if the WHERE clause is specified in other form +(e.g. brackets were opened). + +TODO +---- +* is it a problem if there are keys that are referred to both from + index_merge and from range access? + +* How strict is the limitation on the form of the WHERE? + +* TODO: The optimizer didn't compare costs of index_merge and range before (ok + it did but that was done for accesses to different tables). Will there be any + possible gotchas here? DESCRIPTION: Current range optimizer will discard possible index_merge/[sort]union strategies when there is a possible range plan. This action is a part of measures we take to avoid combinatorial explosion of possible range/ index_merge strategies. A bad side effect of this is that for WHERE clauses in form t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2') the optimizer will - discard union(key2,key3) in favor of range(key1) - consider costs of using range(key1) and discard that plan also and the overall effect is that possible poor range access will cause possible good index_merge access not to be considered. This WL is to about lifting this limitation at least for some subset of WHERE clauses. HIGH-LEVEL SPECIFICATION: (Not a ready HLS but draft) <contents> Solution overview Limitations TODO </contents> Solution overview ================= The idea is to delay discarding potential index_merge plans until the point where it is really necessary. This way, we won't have to do much changes in the range analyzer, but will be able to keep potential index_merge plan just enough so that it's possible to take it into consideration together with range access plans. Since there are no changes in the optimizer, the ability to consider both range and index_merge options will be limited to WHERE clauses of this form: WHERE := range_cond(key1_1) AND range_cond(key2_1) AND other_cond AND index_merge_OR_cond1(key3_1, key3_2, ...) index_merge_OR_cond2(key4_1, key4_2, ...) where index_merge_OR_cond{N} := (range_cond(keyN_1) OR range_cond(keyN_2) OR ...) range_cond(keyX) := condition that allows to construct range access of keyX and doesn't allow to construct range/index_merge accesses for any keys of the table in question. For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: SEL_TREE( range(key1_1), ... range(key2_1), SEL_IMERGE( (1) SEL_TREE(key3_1}) SEL_TREE(key3_2}) ... ) ... ) which can be used to make a cost-based choice between range and index_merge. Limitations ----------- This will not be a full solution in a sense that the range analyzer will not be able to produce sel_tree (1) if the WHERE clause is specified in other form (e.g. brackets were opened). TODO ---- * is it a problem if there are keys that are referred to both from index_merge and from range access? * How strict is the limitation on the form of the WHERE? * Which version should this be based on? 5.1? Which patches are should be in (google's/percona's/maria/etc?) * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? LOW-LEVEL DESIGN: <contents> 1. Current implementation overview 1.1. Problems in the current implementation 2. New implementation 2.1 New tree_and() 2.2 New tree_or() 3. Testing and required coverage </contents> 1. Current implementation overview ================================== At the moment, range analyzer works as follows: SEL_TREE structure represents # There are sel_trees, a sel_tree is either range or merge tree sel_tree = range_tree | imerge_tree # a range tree has range access options, possibly for several keys range_tree = range(key1) AND range(key2) AND ... AND range(keyN); (here range(keyi) may represent ranges not for initial keyi prefixes, but ranges for any infixes for keyi) # merge tree represents several way to index_merge imerge_tree = imerge1 AND imerge2 AND ... # a way to do index merge == a set to use of different indexes. imergeX = range_tree1 OR range_tree2 OR .. where no pair of range_treeX have ranges over the same index. tree_and(A, B) { if (both A and B are range trees) return a range_tree with computed intersection for each range; if (only one of A and B is a range tree) return that tree; // DISCARD-IMERGE-1 // at this point both trees are index_merge trees return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN); } tree_or(A, B) { if (A and B are range trees) { R = new range_tree; for each index i R.add(range_union(A.range(i), B.range(i))); if (R has at least one range access) return R; // DISCARD-IMERGE-2 else { /* could not build any range accesses. construct index_merge */ remove non-ranges from A; remove non-ranges from B; return new index_merge(A, B); // DISCARD-IMERGE-3 } } else if (A is range tree and B is index_merge tree (or vice versa)) { Perform this transformation: range_treeA // this is A OR (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND ... (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) = (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND ... (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) Now each line represents an index_merge.. } else if (both A and B are index_merge trees) { Perform this transformation: imergeA1 AND imergeA2 AND ... AND imergeAN OR imergeB1 AND imergeB2 AND ... AND imergeBN -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-4 imergeA1 OR imergeB1 = = (combine imergeA1 with each of the range_treeB_1{i} ) = combine(imergeA1 OR range_treeB_11) AND combine(imergeA1 OR range_treeB_12) AND ... AND combine(imergeA1 OR range_treeB_1N) } } 1.1. Problems in the current implementation ------------------------------------------- As marked in the code above: DISCARD-IMERGE-1 step will cause index_merge option to be discarded when the WHERE clause has this form: (t.key1=c1 OR t.key2=c2) AND t.badkey < c3 DISCARD-IMERGE-2 step will cause index_merge option to be discarded when the WHERE clause has this form (conditions t.badkey may have abritrary form): (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2) DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are two indexes: INDEX i1(col1, col2), INDEX i2(col1, col3) and this WHERE clause: col1=c1 AND (col2=c2 OR col3=c3) The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. (Here no imerge for col2=c2 OR col3=c3 will be built since neither col2=c2 nor col3=c3 represent index ranges.) 2. New implementation ===================== <general idea> * Don't start fighting combinatorial explosion until we've actually got one. </> SEL_TREE structure will be now able to hold both index_merge and range scan candidates at the same time. That is, sel_tree2 = range_tree AND imerge_tree where both parts are optional (i.e. can be empty) Operations on SEL_ARG trees will be modified to produce/process the trees of this kind: 2.1 New tree_and() ------------------ In order not to lose plans, we'll make these changes: A1. Don't remove index_merge part of the tree (this will take care of DISCARD-IMERGE-1 problem) A2. Push range conditions down into index_merge trees that may support them. if one tree has range(key1) and the other tree has imerge(key1 OR key2) then perform an equvalent of this operation: rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) = (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) A3. Just as before: if both sel_tree A and sel_tree B have index_merge options, concatenate them together. 2.2 New tree_or() ----------------- O1. Dont remove non-range plans: Current tree_or() code will refuse to produce index_merge plans for conditions like "t.key1part2=const OR t.key2part1=const" (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of the AND condition is not usable for range access, and the operation of tree_and() guaranteed that there was no way it could changed to make a usable range plan. With new tree_and() and rule A2, this is no longer the case. For example for this query: (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const it will construct a imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const) then tree_and() will apply rule A2 to push the range down into index merge and after that we'll have: range(t.key1part1=const) imerge( t.key1part2=const AND t.key1part1=const, t.key2part1=const ) note that imerge(...) describes a usable index_merge plan and it's possible that it will be the best access path. O2. "Create index_merge accesses when possible" Current tree_or() will not create index_merge access when it could create non-index merge access (see DISCARD-IMERGE-2 and its example in the "Problems in the current implementation" section). This will be changed to work as follows: we will create index_merge made for index scans that didn't have their match in the other sel_tree. Ilustrating it with an example: | sel_tree_A | sel_tree_B | A or B | include in index_merge? ------+------------+------------+--------+------------------------ key1 | cond1 | cond2 | condM | no key2 | cond3 | cond4 | NULL | no key3 | cond5 | | | yes, A-side key4 | cond6 | | | yes, A-side key5 | | cond7 | | yes, B-side key6 | | cond8 | | yes, B-side here we assume that - (cond1 OR cond2) did produce a combined range. Not including them in index_merge. - (cond3 OR cond4) didn't produce a usable range (e.g. they were t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them didn't yield any range list) - All other scand didn't have their counterparts, so we'll end up with a SEL_TREE of: range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8)) . O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven seen any complaints that could be attributed to it. If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to lift it ,and produce a cross-product: ((key1p OR key2p) AND (key3p OR key4p)) OR ((key5p OR key6p) AND (key7p OR key8p)) = (key1p OR key2p OR key5p OR key6p) AND // this part is currently (key3p OR key4p OR key5p OR key6p) AND // produced (key1p OR key2p OR key5p OR key6p) AND // this part will be added (key3p OR key4p OR key5p OR key6p) //. In order to limit the impact of this combinatorial explosion, we will introduce a rule that we won't generate more than #defined MAX_IMERGE_OPTS options. 3. Testing and required coverage ================================ So far could find the following user cases: * BUG#17259: Query optimizer chooses wrong index * BUG#17673: Optimizer does not use Index Merge optimization in some cases * BUG#23322: Optimizer sometimes erroniously prefers other index over index merge * BUG#30151: optimizer is very reluctant to chose index_merge algorithm ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: fair choice between index_merge union and range access CREATION DATE..: Tue, 26 May 2009, 12:10 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: Psergey CATEGORY.......: Server-Sprint TASK ID........: 24 (http://askmonty.org/worklog/?tid=24) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Sun, 16 Aug 2009, 02:13)=-=- Low Level Design modified. --- /tmp/wklog.24.old.23383 2009-08-16 02:13:54.000000000 +0300 +++ /tmp/wklog.24.new.23383 2009-08-16 02:13:54.000000000 +0300 @@ -125,7 +125,7 @@ The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. -(Here an imerge for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor +(Here no imerge for col2=c2 OR col3=c3 will be built since neither col2=c2 nor col3=c3 represent index ranges.) @@ -199,7 +199,7 @@ O2. "Create index_merge accesses when possible" Current tree_or() will not create index_merge access when it could create - non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems + non-index merge access (see DISCARD-IMERGE-2 and its example in the "Problems in the current implementation" section). This will be changed to work as follows: we will create index_merge made for index scans that didn't have their match in the other sel_tree. -=-=(Guest - Sun, 16 Aug 2009, 01:03)=-=- Low Level Design modified. --- /tmp/wklog.24.old.20767 2009-08-16 01:03:11.000000000 +0300 +++ /tmp/wklog.24.new.20767 2009-08-16 01:03:11.000000000 +0300 @@ -18,6 +18,8 @@ # a range tree has range access options, possibly for several keys range_tree = range(key1) AND range(key2) AND ... AND range(keyN); + (here range(keyi) may represent ranges not for initial keyi prefixes, + but ranges for any infixes for keyi) # merge tree represents several way to index_merge imerge_tree = imerge1 AND imerge2 AND ... @@ -47,13 +49,13 @@ R.add(range_union(A.range(i), B.range(i))); if (R has at least one range access) - return R; + return R; // DISCARD-IMERGE-2 else { /* could not build any range accesses. construct index_merge */ - remove non-ranges from A; // DISCARD-IMERGE-2 + remove non-ranges from A; remove non-ranges from B; - return new index_merge(A, B); + return new index_merge(A, B); // DISCARD-IMERGE-3 } } else if (A is range tree and B is index_merge tree (or vice versa)) @@ -65,12 +67,12 @@ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND ... - (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND + (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) = (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND ... - (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) Now each line represents an index_merge.. } @@ -82,18 +84,18 @@ OR imergeB1 AND imergeB2 AND ... AND imergeBN - -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3 + -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-4 imergeA1 OR - imergeB1 AND imergeB2 AND ... AND imergeBN = + imergeB1 = - = (combine imergeA1 with each of the imergeB{i} ) = + = (combine imergeA1 with each of the range_treeB_1{i} ) = - combine(imergeA1 OR imergeB1) AND - combine(imergeA1 OR imergeB2) AND + combine(imergeA1 OR range_treeB_11) AND + combine(imergeA1 OR range_treeB_12) AND ... AND - combine(imergeA1 OR imergeBN) + combine(imergeA1 OR range_treeB_1N) } } @@ -109,7 +111,7 @@ DISCARD-IMERGE-2 step will cause index_merge option to be discarded when the WHERE clause has this form (conditions t.badkey may have abritrary form): - (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2) + (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2) DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are two indexes: @@ -123,6 +125,8 @@ The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. +(Here an imerge for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor +col3=c3 represent index ranges.) 2. New implementation -=-=(Guest - Mon, 20 Jul 2009, 17:13)=-=- Dependency deleted: 30 no longer depends on 24 -=-=(Guest - Sat, 20 Jun 2009, 09:34)=-=- Low Level Design modified. --- /tmp/wklog.24.old.21663 2009-06-20 09:34:48.000000000 +0300 +++ /tmp/wklog.24.new.21663 2009-06-20 09:34:48.000000000 +0300 @@ -4,6 +4,7 @@ 2. New implementation 2.1 New tree_and() 2.2 New tree_or() +3. Testing and required coverage </contents> 1. Current implementation overview @@ -240,3 +241,14 @@ In order to limit the impact of this combinatorial explosion, we will introduce a rule that we won't generate more than #defined MAX_IMERGE_OPTS options. + +3. Testing and required coverage +================================ +So far could find the following user cases: + +* BUG#17259: Query optimizer chooses wrong index +* BUG#17673: Optimizer does not use Index Merge optimization in some cases +* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge +* BUG#30151: optimizer is very reluctant to chose index_merge algorithm + + -=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=- Low Level Design modified. --- /tmp/wklog.24.old.19152 2009-06-18 16:55:00.000000000 +0300 +++ /tmp/wklog.24.new.19152 2009-06-18 16:55:00.000000000 +0300 @@ -141,13 +141,15 @@ Operations on SEL_ARG trees will be modified to produce/process the trees of this kind: + 2.1 New tree_and() ------------------ In order not to lose plans, we'll make these changes: -1. Don't remove index_merge part of the tree. +A1. Don't remove index_merge part of the tree (this will take care of + DISCARD-IMERGE-1 problem) -2. Push range conditions down into index_merge trees that may support them. +A2. Push range conditions down into index_merge trees that may support them. if one tree has range(key1) and the other tree has imerge(key1 OR key2) then perform an equvalent of this operation: @@ -155,8 +157,86 @@ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) -3. Just as before: if both sel_tree A and sel_tree B have index_merge options, +A3. Just as before: if both sel_tree A and sel_tree B have index_merge options, concatenate them together. -2.2 New tree_or() +2.2 New tree_or() +----------------- +O1. Dont remove non-range plans: + Current tree_or() code will refuse to produce index_merge plans for + conditions like + + "t.key1part2=const OR t.key2part1=const" + + (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of + the AND condition is not usable for range access, and the operation of + tree_and() guaranteed that there was no way it could changed to make a + usable range plan. With new tree_and() and rule A2, this is no longer the + case. For example for this query: + + (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const + + it will construct a + + imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const) + + then tree_and() will apply rule A2 to push the range down into index merge + and after that we'll have: + + range(t.key1part1=const) + imerge( + t.key1part2=const AND t.key1part1=const, + t.key2part1=const + ) + note that imerge(...) describes a usable index_merge plan and it's possible + that it will be the best access path. + +O2. "Create index_merge accesses when possible" + Current tree_or() will not create index_merge access when it could create + non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems + in the current implementation" section). This will be changed to work as + follows: we will create index_merge made for index scans that didn't have + their match in the other sel_tree. + Ilustrating it with an example: + + | sel_tree_A | sel_tree_B | A or B | include in index_merge? + ------+------------+------------+--------+------------------------ + key1 | cond1 | cond2 | condM | no + key2 | cond3 | cond4 | NULL | no + key3 | cond5 | | | yes, A-side + key4 | cond6 | | | yes, A-side + key5 | | cond7 | | yes, B-side + key6 | | cond8 | | yes, B-side + + here we assume that + - (cond1 OR cond2) did produce a combined range. Not including them in + index_merge. + - (cond3 OR cond4) didn't produce a usable range (e.g. they were + t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them + didn't yield any range list) + - All other scand didn't have their counterparts, so we'll end up with a + SEL_TREE of: + + range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8)) + . + +O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is +that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven +seen any complaints that could be attributed to it. +If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to +lift it ,and produce a cross-product: + + ((key1p OR key2p) AND (key3p OR key4p)) + OR + ((key5p OR key6p) AND (key7p OR key8p)) + + = (key1p OR key2p OR key5p OR key6p) AND // this part is currently + (key3p OR key4p OR key5p OR key6p) AND // produced + + (key1p OR key2p OR key5p OR key6p) AND // this part will be added + (key3p OR key4p OR key5p OR key6p) //. + +In order to limit the impact of this combinatorial explosion, we will +introduce a rule that we won't generate more than #defined +MAX_IMERGE_OPTS options. -=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=- Low Level Design modified. --- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300 +++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300 @@ -1 +1,162 @@ +<contents> +1. Current implementation overview +1.1. Problems in the current implementation +2. New implementation +2.1 New tree_and() +2.2 New tree_or() +</contents> + +1. Current implementation overview +================================== +At the moment, range analyzer works as follows: + +SEL_TREE structure represents + + # There are sel_trees, a sel_tree is either range or merge tree + sel_tree = range_tree | imerge_tree + + # a range tree has range access options, possibly for several keys + range_tree = range(key1) AND range(key2) AND ... AND range(keyN); + + # merge tree represents several way to index_merge + imerge_tree = imerge1 AND imerge2 AND ... + + # a way to do index merge == a set to use of different indexes. + imergeX = range_tree1 OR range_tree2 OR .. + where no pair of range_treeX have ranges over the same index. + + + tree_and(A, B) + { + if (both A and B are range trees) + return a range_tree with computed intersection for each range; + if (only one of A and B is a range tree) + return that tree; // DISCARD-IMERGE-1 + // at this point both trees are index_merge trees + return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN); + } + + + tree_or(A, B) + { + if (A and B are range trees) + { + R = new range_tree; + for each index i + R.add(range_union(A.range(i), B.range(i))); + + if (R has at least one range access) + return R; + else + { + /* could not build any range accesses. construct index_merge */ + remove non-ranges from A; // DISCARD-IMERGE-2 + remove non-ranges from B; + return new index_merge(A, B); + } + } + else if (A is range tree and B is index_merge tree (or vice versa)) + { + Perform this transformation: + + range_treeA // this is A + OR + (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND + (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND + ... + (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND + = + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND + ... + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + + Now each line represents an index_merge.. + } + else if (both A and B are index_merge trees) + { + Perform this transformation: + + imergeA1 AND imergeA2 AND ... AND imergeAN + OR + imergeB1 AND imergeB2 AND ... AND imergeBN + + -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3 + + imergeA1 + OR + imergeB1 AND imergeB2 AND ... AND imergeBN = + + = (combine imergeA1 with each of the imergeB{i} ) = + + combine(imergeA1 OR imergeB1) AND + combine(imergeA1 OR imergeB2) AND + ... AND + combine(imergeA1 OR imergeBN) + } + } + +1.1. Problems in the current implementation +------------------------------------------- +As marked in the code above: + +DISCARD-IMERGE-1 step will cause index_merge option to be discarded when +the WHERE clause has this form: + + (t.key1=c1 OR t.key2=c2) AND t.badkey < c3 + +DISCARD-IMERGE-2 step will cause index_merge option to be discarded when +the WHERE clause has this form (conditions t.badkey may have abritrary form): + + (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2) + +DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are +two indexes: + + INDEX i1(col1, col2), + INDEX i2(col1, col3) + +and this WHERE clause: + + col1=c1 AND (col2=c2 OR col3=c3) + +The optimizer will generate the plans that only use the "col1=c1" part. The +right side of the AND will be ignored even if it has good selectivity. + + +2. New implementation +===================== + +<general idea> +* Don't start fighting combinatorial explosion until we've actually got one. +</> + +SEL_TREE structure will be now able to hold both index_merge and range scan +candidates at the same time. That is, + + sel_tree2 = range_tree AND imerge_tree + +where both parts are optional (i.e. can be empty) + +Operations on SEL_ARG trees will be modified to produce/process the trees of +this kind: + +2.1 New tree_and() +------------------ +In order not to lose plans, we'll make these changes: + +1. Don't remove index_merge part of the tree. + +2. Push range conditions down into index_merge trees that may support them. + if one tree has range(key1) and the other tree has imerge(key1 OR key2) + then perform an equvalent of this operation: + + rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) = + + (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) + +3. Just as before: if both sel_tree A and sel_tree B have index_merge options, + concatenate them together. + +2.2 New tree_or() -=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=- Dependency created: 30 now depends on 24 -=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300 +++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300 @@ -64,6 +64,9 @@ * How strict is the limitation on the form of the WHERE? +* Which version should this be based on? 5.1? Which patches are should be in + (google's/percona's/maria/etc?) + * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? -=-=(Guest - Wed, 27 May 2009, 13:59)=-=- Title modified. --- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300 +++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300 @@ -1 +1 @@ -index_merge optimizer: dont discard index_merge union strategies when range is available +index_merge: fair choice between index_merge union and range access -=-=(Guest - Tue, 26 May 2009, 13:27)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.305 2009-05-26 13:27:32.000000000 +0300 +++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.000000000 +0300 @@ -1 +1,70 @@ +(Not a ready HLS but draft) +<contents> +Solution overview +Limitations +TODO + +</contents> + +Solution overview +================= +The idea is to delay discarding potential index_merge plans until the point +where it is really necessary. + +This way, we won't have to do much changes in the range analyzer, but will be +able to keep potential index_merge plan just enough so that it's possible to +take it into consideration together with range access plans. + +Since there are no changes in the optimizer, the ability to consider both +range and index_merge options will be limited to WHERE clauses of this form: + + WHERE := range_cond(key1_1) AND + range_cond(key2_1) AND + other_cond AND + index_merge_OR_cond1(key3_1, key3_2, ...) + index_merge_OR_cond2(key4_1, key4_2, ...) + +where + + index_merge_OR_cond{N} := (range_cond(keyN_1) OR + range_cond(keyN_2) OR ...) + + + range_cond(keyX) := condition that allows to construct range access of keyX + and doesn't allow to construct range/index_merge accesses + for any keys of the table in question. + + +For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: + + SEL_TREE( + range(key1_1), + ... + range(key2_1), + SEL_IMERGE( (1) + SEL_TREE(key3_1}) + SEL_TREE(key3_2}) + ... + ) + ... + ) + +which can be used to make a cost-based choice between range and index_merge. + +Limitations +----------- +This will not be a full solution in a sense that the range analyzer will not +be able to produce sel_tree (1) if the WHERE clause is specified in other form +(e.g. brackets were opened). + +TODO +---- +* is it a problem if there are keys that are referred to both from + index_merge and from range access? + +* How strict is the limitation on the form of the WHERE? + +* TODO: The optimizer didn't compare costs of index_merge and range before (ok + it did but that was done for accesses to different tables). Will there be any + possible gotchas here? DESCRIPTION: Current range optimizer will discard possible index_merge/[sort]union strategies when there is a possible range plan. This action is a part of measures we take to avoid combinatorial explosion of possible range/ index_merge strategies. A bad side effect of this is that for WHERE clauses in form t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2') the optimizer will - discard union(key2,key3) in favor of range(key1) - consider costs of using range(key1) and discard that plan also and the overall effect is that possible poor range access will cause possible good index_merge access not to be considered. This WL is to about lifting this limitation at least for some subset of WHERE clauses. HIGH-LEVEL SPECIFICATION: (Not a ready HLS but draft) <contents> Solution overview Limitations TODO </contents> Solution overview ================= The idea is to delay discarding potential index_merge plans until the point where it is really necessary. This way, we won't have to do much changes in the range analyzer, but will be able to keep potential index_merge plan just enough so that it's possible to take it into consideration together with range access plans. Since there are no changes in the optimizer, the ability to consider both range and index_merge options will be limited to WHERE clauses of this form: WHERE := range_cond(key1_1) AND range_cond(key2_1) AND other_cond AND index_merge_OR_cond1(key3_1, key3_2, ...) index_merge_OR_cond2(key4_1, key4_2, ...) where index_merge_OR_cond{N} := (range_cond(keyN_1) OR range_cond(keyN_2) OR ...) range_cond(keyX) := condition that allows to construct range access of keyX and doesn't allow to construct range/index_merge accesses for any keys of the table in question. For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: SEL_TREE( range(key1_1), ... range(key2_1), SEL_IMERGE( (1) SEL_TREE(key3_1}) SEL_TREE(key3_2}) ... ) ... ) which can be used to make a cost-based choice between range and index_merge. Limitations ----------- This will not be a full solution in a sense that the range analyzer will not be able to produce sel_tree (1) if the WHERE clause is specified in other form (e.g. brackets were opened). TODO ---- * is it a problem if there are keys that are referred to both from index_merge and from range access? * How strict is the limitation on the form of the WHERE? * Which version should this be based on? 5.1? Which patches are should be in (google's/percona's/maria/etc?) * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? LOW-LEVEL DESIGN: <contents> 1. Current implementation overview 1.1. Problems in the current implementation 2. New implementation 2.1 New tree_and() 2.2 New tree_or() 3. Testing and required coverage </contents> 1. Current implementation overview ================================== At the moment, range analyzer works as follows: SEL_TREE structure represents # There are sel_trees, a sel_tree is either range or merge tree sel_tree = range_tree | imerge_tree # a range tree has range access options, possibly for several keys range_tree = range(key1) AND range(key2) AND ... AND range(keyN); (here range(keyi) may represent ranges not for initial keyi prefixes, but ranges for any infixes for keyi) # merge tree represents several way to index_merge imerge_tree = imerge1 AND imerge2 AND ... # a way to do index merge == a set to use of different indexes. imergeX = range_tree1 OR range_tree2 OR .. where no pair of range_treeX have ranges over the same index. tree_and(A, B) { if (both A and B are range trees) return a range_tree with computed intersection for each range; if (only one of A and B is a range tree) return that tree; // DISCARD-IMERGE-1 // at this point both trees are index_merge trees return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN); } tree_or(A, B) { if (A and B are range trees) { R = new range_tree; for each index i R.add(range_union(A.range(i), B.range(i))); if (R has at least one range access) return R; // DISCARD-IMERGE-2 else { /* could not build any range accesses. construct index_merge */ remove non-ranges from A; remove non-ranges from B; return new index_merge(A, B); // DISCARD-IMERGE-3 } } else if (A is range tree and B is index_merge tree (or vice versa)) { Perform this transformation: range_treeA // this is A OR (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND ... (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) = (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND ... (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) Now each line represents an index_merge.. } else if (both A and B are index_merge trees) { Perform this transformation: imergeA1 AND imergeA2 AND ... AND imergeAN OR imergeB1 AND imergeB2 AND ... AND imergeBN -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-4 imergeA1 OR imergeB1 = = (combine imergeA1 with each of the range_treeB_1{i} ) = combine(imergeA1 OR range_treeB_11) AND combine(imergeA1 OR range_treeB_12) AND ... AND combine(imergeA1 OR range_treeB_1N) } } 1.1. Problems in the current implementation ------------------------------------------- As marked in the code above: DISCARD-IMERGE-1 step will cause index_merge option to be discarded when the WHERE clause has this form: (t.key1=c1 OR t.key2=c2) AND t.badkey < c3 DISCARD-IMERGE-2 step will cause index_merge option to be discarded when the WHERE clause has this form (conditions t.badkey may have abritrary form): (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2) DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are two indexes: INDEX i1(col1, col2), INDEX i2(col1, col3) and this WHERE clause: col1=c1 AND (col2=c2 OR col3=c3) The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. (Here no imerge for col2=c2 OR col3=c3 will be built since neither col2=c2 nor col3=c3 represent index ranges.) 2. New implementation ===================== <general idea> * Don't start fighting combinatorial explosion until we've actually got one. </> SEL_TREE structure will be now able to hold both index_merge and range scan candidates at the same time. That is, sel_tree2 = range_tree AND imerge_tree where both parts are optional (i.e. can be empty) Operations on SEL_ARG trees will be modified to produce/process the trees of this kind: 2.1 New tree_and() ------------------ In order not to lose plans, we'll make these changes: A1. Don't remove index_merge part of the tree (this will take care of DISCARD-IMERGE-1 problem) A2. Push range conditions down into index_merge trees that may support them. if one tree has range(key1) and the other tree has imerge(key1 OR key2) then perform an equvalent of this operation: rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) = (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) A3. Just as before: if both sel_tree A and sel_tree B have index_merge options, concatenate them together. 2.2 New tree_or() ----------------- O1. Dont remove non-range plans: Current tree_or() code will refuse to produce index_merge plans for conditions like "t.key1part2=const OR t.key2part1=const" (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of the AND condition is not usable for range access, and the operation of tree_and() guaranteed that there was no way it could changed to make a usable range plan. With new tree_and() and rule A2, this is no longer the case. For example for this query: (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const it will construct a imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const) then tree_and() will apply rule A2 to push the range down into index merge and after that we'll have: range(t.key1part1=const) imerge( t.key1part2=const AND t.key1part1=const, t.key2part1=const ) note that imerge(...) describes a usable index_merge plan and it's possible that it will be the best access path. O2. "Create index_merge accesses when possible" Current tree_or() will not create index_merge access when it could create non-index merge access (see DISCARD-IMERGE-2 and its example in the "Problems in the current implementation" section). This will be changed to work as follows: we will create index_merge made for index scans that didn't have their match in the other sel_tree. Ilustrating it with an example: | sel_tree_A | sel_tree_B | A or B | include in index_merge? ------+------------+------------+--------+------------------------ key1 | cond1 | cond2 | condM | no key2 | cond3 | cond4 | NULL | no key3 | cond5 | | | yes, A-side key4 | cond6 | | | yes, A-side key5 | | cond7 | | yes, B-side key6 | | cond8 | | yes, B-side here we assume that - (cond1 OR cond2) did produce a combined range. Not including them in index_merge. - (cond3 OR cond4) didn't produce a usable range (e.g. they were t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them didn't yield any range list) - All other scand didn't have their counterparts, so we'll end up with a SEL_TREE of: range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8)) . O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven seen any complaints that could be attributed to it. If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to lift it ,and produce a cross-product: ((key1p OR key2p) AND (key3p OR key4p)) OR ((key5p OR key6p) AND (key7p OR key8p)) = (key1p OR key2p OR key5p OR key6p) AND // this part is currently (key3p OR key4p OR key5p OR key6p) AND // produced (key1p OR key2p OR key5p OR key6p) AND // this part will be added (key3p OR key4p OR key5p OR key6p) //. In order to limit the impact of this combinatorial explosion, we will introduce a rule that we won't generate more than #defined MAX_IMERGE_OPTS options. 3. Testing and required coverage ================================ So far could find the following user cases: * BUG#17259: Query optimizer chooses wrong index * BUG#17673: Optimizer does not use Index Merge optimization in some cases * BUG#23322: Optimizer sometimes erroniously prefers other index over index merge * BUG#30151: optimizer is very reluctant to chose index_merge algorithm ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: fair choice between index_merge union and range access CREATION DATE..: Tue, 26 May 2009, 12:10 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: Psergey CATEGORY.......: Server-Sprint TASK ID........: 24 (http://askmonty.org/worklog/?tid=24) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Sun, 16 Aug 2009, 01:03)=-=- Low Level Design modified. --- /tmp/wklog.24.old.20767 2009-08-16 01:03:11.000000000 +0300 +++ /tmp/wklog.24.new.20767 2009-08-16 01:03:11.000000000 +0300 @@ -18,6 +18,8 @@ # a range tree has range access options, possibly for several keys range_tree = range(key1) AND range(key2) AND ... AND range(keyN); + (here range(keyi) may represent ranges not for initial keyi prefixes, + but ranges for any infixes for keyi) # merge tree represents several way to index_merge imerge_tree = imerge1 AND imerge2 AND ... @@ -47,13 +49,13 @@ R.add(range_union(A.range(i), B.range(i))); if (R has at least one range access) - return R; + return R; // DISCARD-IMERGE-2 else { /* could not build any range accesses. construct index_merge */ - remove non-ranges from A; // DISCARD-IMERGE-2 + remove non-ranges from A; remove non-ranges from B; - return new index_merge(A, B); + return new index_merge(A, B); // DISCARD-IMERGE-3 } } else if (A is range tree and B is index_merge tree (or vice versa)) @@ -65,12 +67,12 @@ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND ... - (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND + (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) = (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND ... - (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) Now each line represents an index_merge.. } @@ -82,18 +84,18 @@ OR imergeB1 AND imergeB2 AND ... AND imergeBN - -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3 + -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-4 imergeA1 OR - imergeB1 AND imergeB2 AND ... AND imergeBN = + imergeB1 = - = (combine imergeA1 with each of the imergeB{i} ) = + = (combine imergeA1 with each of the range_treeB_1{i} ) = - combine(imergeA1 OR imergeB1) AND - combine(imergeA1 OR imergeB2) AND + combine(imergeA1 OR range_treeB_11) AND + combine(imergeA1 OR range_treeB_12) AND ... AND - combine(imergeA1 OR imergeBN) + combine(imergeA1 OR range_treeB_1N) } } @@ -109,7 +111,7 @@ DISCARD-IMERGE-2 step will cause index_merge option to be discarded when the WHERE clause has this form (conditions t.badkey may have abritrary form): - (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2) + (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2) DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are two indexes: @@ -123,6 +125,8 @@ The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. +(Here an imerge for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor +col3=c3 represent index ranges.) 2. New implementation -=-=(Guest - Mon, 20 Jul 2009, 17:13)=-=- Dependency deleted: 30 no longer depends on 24 -=-=(Guest - Sat, 20 Jun 2009, 09:34)=-=- Low Level Design modified. --- /tmp/wklog.24.old.21663 2009-06-20 09:34:48.000000000 +0300 +++ /tmp/wklog.24.new.21663 2009-06-20 09:34:48.000000000 +0300 @@ -4,6 +4,7 @@ 2. New implementation 2.1 New tree_and() 2.2 New tree_or() +3. Testing and required coverage </contents> 1. Current implementation overview @@ -240,3 +241,14 @@ In order to limit the impact of this combinatorial explosion, we will introduce a rule that we won't generate more than #defined MAX_IMERGE_OPTS options. + +3. Testing and required coverage +================================ +So far could find the following user cases: + +* BUG#17259: Query optimizer chooses wrong index +* BUG#17673: Optimizer does not use Index Merge optimization in some cases +* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge +* BUG#30151: optimizer is very reluctant to chose index_merge algorithm + + -=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=- Low Level Design modified. --- /tmp/wklog.24.old.19152 2009-06-18 16:55:00.000000000 +0300 +++ /tmp/wklog.24.new.19152 2009-06-18 16:55:00.000000000 +0300 @@ -141,13 +141,15 @@ Operations on SEL_ARG trees will be modified to produce/process the trees of this kind: + 2.1 New tree_and() ------------------ In order not to lose plans, we'll make these changes: -1. Don't remove index_merge part of the tree. +A1. Don't remove index_merge part of the tree (this will take care of + DISCARD-IMERGE-1 problem) -2. Push range conditions down into index_merge trees that may support them. +A2. Push range conditions down into index_merge trees that may support them. if one tree has range(key1) and the other tree has imerge(key1 OR key2) then perform an equvalent of this operation: @@ -155,8 +157,86 @@ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) -3. Just as before: if both sel_tree A and sel_tree B have index_merge options, +A3. Just as before: if both sel_tree A and sel_tree B have index_merge options, concatenate them together. -2.2 New tree_or() +2.2 New tree_or() +----------------- +O1. Dont remove non-range plans: + Current tree_or() code will refuse to produce index_merge plans for + conditions like + + "t.key1part2=const OR t.key2part1=const" + + (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of + the AND condition is not usable for range access, and the operation of + tree_and() guaranteed that there was no way it could changed to make a + usable range plan. With new tree_and() and rule A2, this is no longer the + case. For example for this query: + + (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const + + it will construct a + + imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const) + + then tree_and() will apply rule A2 to push the range down into index merge + and after that we'll have: + + range(t.key1part1=const) + imerge( + t.key1part2=const AND t.key1part1=const, + t.key2part1=const + ) + note that imerge(...) describes a usable index_merge plan and it's possible + that it will be the best access path. + +O2. "Create index_merge accesses when possible" + Current tree_or() will not create index_merge access when it could create + non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems + in the current implementation" section). This will be changed to work as + follows: we will create index_merge made for index scans that didn't have + their match in the other sel_tree. + Ilustrating it with an example: + + | sel_tree_A | sel_tree_B | A or B | include in index_merge? + ------+------------+------------+--------+------------------------ + key1 | cond1 | cond2 | condM | no + key2 | cond3 | cond4 | NULL | no + key3 | cond5 | | | yes, A-side + key4 | cond6 | | | yes, A-side + key5 | | cond7 | | yes, B-side + key6 | | cond8 | | yes, B-side + + here we assume that + - (cond1 OR cond2) did produce a combined range. Not including them in + index_merge. + - (cond3 OR cond4) didn't produce a usable range (e.g. they were + t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them + didn't yield any range list) + - All other scand didn't have their counterparts, so we'll end up with a + SEL_TREE of: + + range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8)) + . + +O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is +that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven +seen any complaints that could be attributed to it. +If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to +lift it ,and produce a cross-product: + + ((key1p OR key2p) AND (key3p OR key4p)) + OR + ((key5p OR key6p) AND (key7p OR key8p)) + + = (key1p OR key2p OR key5p OR key6p) AND // this part is currently + (key3p OR key4p OR key5p OR key6p) AND // produced + + (key1p OR key2p OR key5p OR key6p) AND // this part will be added + (key3p OR key4p OR key5p OR key6p) //. + +In order to limit the impact of this combinatorial explosion, we will +introduce a rule that we won't generate more than #defined +MAX_IMERGE_OPTS options. -=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=- Low Level Design modified. --- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300 +++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300 @@ -1 +1,162 @@ +<contents> +1. Current implementation overview +1.1. Problems in the current implementation +2. New implementation +2.1 New tree_and() +2.2 New tree_or() +</contents> + +1. Current implementation overview +================================== +At the moment, range analyzer works as follows: + +SEL_TREE structure represents + + # There are sel_trees, a sel_tree is either range or merge tree + sel_tree = range_tree | imerge_tree + + # a range tree has range access options, possibly for several keys + range_tree = range(key1) AND range(key2) AND ... AND range(keyN); + + # merge tree represents several way to index_merge + imerge_tree = imerge1 AND imerge2 AND ... + + # a way to do index merge == a set to use of different indexes. + imergeX = range_tree1 OR range_tree2 OR .. + where no pair of range_treeX have ranges over the same index. + + + tree_and(A, B) + { + if (both A and B are range trees) + return a range_tree with computed intersection for each range; + if (only one of A and B is a range tree) + return that tree; // DISCARD-IMERGE-1 + // at this point both trees are index_merge trees + return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN); + } + + + tree_or(A, B) + { + if (A and B are range trees) + { + R = new range_tree; + for each index i + R.add(range_union(A.range(i), B.range(i))); + + if (R has at least one range access) + return R; + else + { + /* could not build any range accesses. construct index_merge */ + remove non-ranges from A; // DISCARD-IMERGE-2 + remove non-ranges from B; + return new index_merge(A, B); + } + } + else if (A is range tree and B is index_merge tree (or vice versa)) + { + Perform this transformation: + + range_treeA // this is A + OR + (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND + (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND + ... + (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND + = + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND + ... + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + + Now each line represents an index_merge.. + } + else if (both A and B are index_merge trees) + { + Perform this transformation: + + imergeA1 AND imergeA2 AND ... AND imergeAN + OR + imergeB1 AND imergeB2 AND ... AND imergeBN + + -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3 + + imergeA1 + OR + imergeB1 AND imergeB2 AND ... AND imergeBN = + + = (combine imergeA1 with each of the imergeB{i} ) = + + combine(imergeA1 OR imergeB1) AND + combine(imergeA1 OR imergeB2) AND + ... AND + combine(imergeA1 OR imergeBN) + } + } + +1.1. Problems in the current implementation +------------------------------------------- +As marked in the code above: + +DISCARD-IMERGE-1 step will cause index_merge option to be discarded when +the WHERE clause has this form: + + (t.key1=c1 OR t.key2=c2) AND t.badkey < c3 + +DISCARD-IMERGE-2 step will cause index_merge option to be discarded when +the WHERE clause has this form (conditions t.badkey may have abritrary form): + + (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2) + +DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are +two indexes: + + INDEX i1(col1, col2), + INDEX i2(col1, col3) + +and this WHERE clause: + + col1=c1 AND (col2=c2 OR col3=c3) + +The optimizer will generate the plans that only use the "col1=c1" part. The +right side of the AND will be ignored even if it has good selectivity. + + +2. New implementation +===================== + +<general idea> +* Don't start fighting combinatorial explosion until we've actually got one. +</> + +SEL_TREE structure will be now able to hold both index_merge and range scan +candidates at the same time. That is, + + sel_tree2 = range_tree AND imerge_tree + +where both parts are optional (i.e. can be empty) + +Operations on SEL_ARG trees will be modified to produce/process the trees of +this kind: + +2.1 New tree_and() +------------------ +In order not to lose plans, we'll make these changes: + +1. Don't remove index_merge part of the tree. + +2. Push range conditions down into index_merge trees that may support them. + if one tree has range(key1) and the other tree has imerge(key1 OR key2) + then perform an equvalent of this operation: + + rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) = + + (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) + +3. Just as before: if both sel_tree A and sel_tree B have index_merge options, + concatenate them together. + +2.2 New tree_or() -=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=- Dependency created: 30 now depends on 24 -=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300 +++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300 @@ -64,6 +64,9 @@ * How strict is the limitation on the form of the WHERE? +* Which version should this be based on? 5.1? Which patches are should be in + (google's/percona's/maria/etc?) + * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? -=-=(Guest - Wed, 27 May 2009, 13:59)=-=- Title modified. --- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300 +++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300 @@ -1 +1 @@ -index_merge optimizer: dont discard index_merge union strategies when range is available +index_merge: fair choice between index_merge union and range access -=-=(Guest - Tue, 26 May 2009, 13:27)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.305 2009-05-26 13:27:32.000000000 +0300 +++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.000000000 +0300 @@ -1 +1,70 @@ +(Not a ready HLS but draft) +<contents> +Solution overview +Limitations +TODO + +</contents> + +Solution overview +================= +The idea is to delay discarding potential index_merge plans until the point +where it is really necessary. + +This way, we won't have to do much changes in the range analyzer, but will be +able to keep potential index_merge plan just enough so that it's possible to +take it into consideration together with range access plans. + +Since there are no changes in the optimizer, the ability to consider both +range and index_merge options will be limited to WHERE clauses of this form: + + WHERE := range_cond(key1_1) AND + range_cond(key2_1) AND + other_cond AND + index_merge_OR_cond1(key3_1, key3_2, ...) + index_merge_OR_cond2(key4_1, key4_2, ...) + +where + + index_merge_OR_cond{N} := (range_cond(keyN_1) OR + range_cond(keyN_2) OR ...) + + + range_cond(keyX) := condition that allows to construct range access of keyX + and doesn't allow to construct range/index_merge accesses + for any keys of the table in question. + + +For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: + + SEL_TREE( + range(key1_1), + ... + range(key2_1), + SEL_IMERGE( (1) + SEL_TREE(key3_1}) + SEL_TREE(key3_2}) + ... + ) + ... + ) + +which can be used to make a cost-based choice between range and index_merge. + +Limitations +----------- +This will not be a full solution in a sense that the range analyzer will not +be able to produce sel_tree (1) if the WHERE clause is specified in other form +(e.g. brackets were opened). + +TODO +---- +* is it a problem if there are keys that are referred to both from + index_merge and from range access? + +* How strict is the limitation on the form of the WHERE? + +* TODO: The optimizer didn't compare costs of index_merge and range before (ok + it did but that was done for accesses to different tables). Will there be any + possible gotchas here? DESCRIPTION: Current range optimizer will discard possible index_merge/[sort]union strategies when there is a possible range plan. This action is a part of measures we take to avoid combinatorial explosion of possible range/ index_merge strategies. A bad side effect of this is that for WHERE clauses in form t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2') the optimizer will - discard union(key2,key3) in favor of range(key1) - consider costs of using range(key1) and discard that plan also and the overall effect is that possible poor range access will cause possible good index_merge access not to be considered. This WL is to about lifting this limitation at least for some subset of WHERE clauses. HIGH-LEVEL SPECIFICATION: (Not a ready HLS but draft) <contents> Solution overview Limitations TODO </contents> Solution overview ================= The idea is to delay discarding potential index_merge plans until the point where it is really necessary. This way, we won't have to do much changes in the range analyzer, but will be able to keep potential index_merge plan just enough so that it's possible to take it into consideration together with range access plans. Since there are no changes in the optimizer, the ability to consider both range and index_merge options will be limited to WHERE clauses of this form: WHERE := range_cond(key1_1) AND range_cond(key2_1) AND other_cond AND index_merge_OR_cond1(key3_1, key3_2, ...) index_merge_OR_cond2(key4_1, key4_2, ...) where index_merge_OR_cond{N} := (range_cond(keyN_1) OR range_cond(keyN_2) OR ...) range_cond(keyX) := condition that allows to construct range access of keyX and doesn't allow to construct range/index_merge accesses for any keys of the table in question. For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: SEL_TREE( range(key1_1), ... range(key2_1), SEL_IMERGE( (1) SEL_TREE(key3_1}) SEL_TREE(key3_2}) ... ) ... ) which can be used to make a cost-based choice between range and index_merge. Limitations ----------- This will not be a full solution in a sense that the range analyzer will not be able to produce sel_tree (1) if the WHERE clause is specified in other form (e.g. brackets were opened). TODO ---- * is it a problem if there are keys that are referred to both from index_merge and from range access? * How strict is the limitation on the form of the WHERE? * Which version should this be based on? 5.1? Which patches are should be in (google's/percona's/maria/etc?) * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? LOW-LEVEL DESIGN: <contents> 1. Current implementation overview 1.1. Problems in the current implementation 2. New implementation 2.1 New tree_and() 2.2 New tree_or() 3. Testing and required coverage </contents> 1. Current implementation overview ================================== At the moment, range analyzer works as follows: SEL_TREE structure represents # There are sel_trees, a sel_tree is either range or merge tree sel_tree = range_tree | imerge_tree # a range tree has range access options, possibly for several keys range_tree = range(key1) AND range(key2) AND ... AND range(keyN); (here range(keyi) may represent ranges not for initial keyi prefixes, but ranges for any infixes for keyi) # merge tree represents several way to index_merge imerge_tree = imerge1 AND imerge2 AND ... # a way to do index merge == a set to use of different indexes. imergeX = range_tree1 OR range_tree2 OR .. where no pair of range_treeX have ranges over the same index. tree_and(A, B) { if (both A and B are range trees) return a range_tree with computed intersection for each range; if (only one of A and B is a range tree) return that tree; // DISCARD-IMERGE-1 // at this point both trees are index_merge trees return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN); } tree_or(A, B) { if (A and B are range trees) { R = new range_tree; for each index i R.add(range_union(A.range(i), B.range(i))); if (R has at least one range access) return R; // DISCARD-IMERGE-2 else { /* could not build any range accesses. construct index_merge */ remove non-ranges from A; remove non-ranges from B; return new index_merge(A, B); // DISCARD-IMERGE-3 } } else if (A is range tree and B is index_merge tree (or vice versa)) { Perform this transformation: range_treeA // this is A OR (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND ... (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) = (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND ... (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) Now each line represents an index_merge.. } else if (both A and B are index_merge trees) { Perform this transformation: imergeA1 AND imergeA2 AND ... AND imergeAN OR imergeB1 AND imergeB2 AND ... AND imergeBN -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-4 imergeA1 OR imergeB1 = = (combine imergeA1 with each of the range_treeB_1{i} ) = combine(imergeA1 OR range_treeB_11) AND combine(imergeA1 OR range_treeB_12) AND ... AND combine(imergeA1 OR range_treeB_1N) } } 1.1. Problems in the current implementation ------------------------------------------- As marked in the code above: DISCARD-IMERGE-1 step will cause index_merge option to be discarded when the WHERE clause has this form: (t.key1=c1 OR t.key2=c2) AND t.badkey < c3 DISCARD-IMERGE-2 step will cause index_merge option to be discarded when the WHERE clause has this form (conditions t.badkey may have abritrary form): (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2) DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are two indexes: INDEX i1(col1, col2), INDEX i2(col1, col3) and this WHERE clause: col1=c1 AND (col2=c2 OR col3=c3) The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. (Here an imerge for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor col3=c3 represent index ranges.) 2. New implementation ===================== <general idea> * Don't start fighting combinatorial explosion until we've actually got one. </> SEL_TREE structure will be now able to hold both index_merge and range scan candidates at the same time. That is, sel_tree2 = range_tree AND imerge_tree where both parts are optional (i.e. can be empty) Operations on SEL_ARG trees will be modified to produce/process the trees of this kind: 2.1 New tree_and() ------------------ In order not to lose plans, we'll make these changes: A1. Don't remove index_merge part of the tree (this will take care of DISCARD-IMERGE-1 problem) A2. Push range conditions down into index_merge trees that may support them. if one tree has range(key1) and the other tree has imerge(key1 OR key2) then perform an equvalent of this operation: rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) = (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) A3. Just as before: if both sel_tree A and sel_tree B have index_merge options, concatenate them together. 2.2 New tree_or() ----------------- O1. Dont remove non-range plans: Current tree_or() code will refuse to produce index_merge plans for conditions like "t.key1part2=const OR t.key2part1=const" (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of the AND condition is not usable for range access, and the operation of tree_and() guaranteed that there was no way it could changed to make a usable range plan. With new tree_and() and rule A2, this is no longer the case. For example for this query: (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const it will construct a imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const) then tree_and() will apply rule A2 to push the range down into index merge and after that we'll have: range(t.key1part1=const) imerge( t.key1part2=const AND t.key1part1=const, t.key2part1=const ) note that imerge(...) describes a usable index_merge plan and it's possible that it will be the best access path. O2. "Create index_merge accesses when possible" Current tree_or() will not create index_merge access when it could create non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems in the current implementation" section). This will be changed to work as follows: we will create index_merge made for index scans that didn't have their match in the other sel_tree. Ilustrating it with an example: | sel_tree_A | sel_tree_B | A or B | include in index_merge? ------+------------+------------+--------+------------------------ key1 | cond1 | cond2 | condM | no key2 | cond3 | cond4 | NULL | no key3 | cond5 | | | yes, A-side key4 | cond6 | | | yes, A-side key5 | | cond7 | | yes, B-side key6 | | cond8 | | yes, B-side here we assume that - (cond1 OR cond2) did produce a combined range. Not including them in index_merge. - (cond3 OR cond4) didn't produce a usable range (e.g. they were t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them didn't yield any range list) - All other scand didn't have their counterparts, so we'll end up with a SEL_TREE of: range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8)) . O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven seen any complaints that could be attributed to it. If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to lift it ,and produce a cross-product: ((key1p OR key2p) AND (key3p OR key4p)) OR ((key5p OR key6p) AND (key7p OR key8p)) = (key1p OR key2p OR key5p OR key6p) AND // this part is currently (key3p OR key4p OR key5p OR key6p) AND // produced (key1p OR key2p OR key5p OR key6p) AND // this part will be added (key3p OR key4p OR key5p OR key6p) //. In order to limit the impact of this combinatorial explosion, we will introduce a rule that we won't generate more than #defined MAX_IMERGE_OPTS options. 3. Testing and required coverage ================================ So far could find the following user cases: * BUG#17259: Query optimizer chooses wrong index * BUG#17673: Optimizer does not use Index Merge optimization in some cases * BUG#23322: Optimizer sometimes erroniously prefers other index over index merge * BUG#30151: optimizer is very reluctant to chose index_merge algorithm ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: index_merge: fair choice between index_merge union and range access CREATION DATE..: Tue, 26 May 2009, 12:10 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: Psergey CATEGORY.......: Server-Sprint TASK ID........: 24 (http://askmonty.org/worklog/?tid=24) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Sun, 16 Aug 2009, 01:03)=-=- Low Level Design modified. --- /tmp/wklog.24.old.20767 2009-08-16 01:03:11.000000000 +0300 +++ /tmp/wklog.24.new.20767 2009-08-16 01:03:11.000000000 +0300 @@ -18,6 +18,8 @@ # a range tree has range access options, possibly for several keys range_tree = range(key1) AND range(key2) AND ... AND range(keyN); + (here range(keyi) may represent ranges not for initial keyi prefixes, + but ranges for any infixes for keyi) # merge tree represents several way to index_merge imerge_tree = imerge1 AND imerge2 AND ... @@ -47,13 +49,13 @@ R.add(range_union(A.range(i), B.range(i))); if (R has at least one range access) - return R; + return R; // DISCARD-IMERGE-2 else { /* could not build any range accesses. construct index_merge */ - remove non-ranges from A; // DISCARD-IMERGE-2 + remove non-ranges from A; remove non-ranges from B; - return new index_merge(A, B); + return new index_merge(A, B); // DISCARD-IMERGE-3 } } else if (A is range tree and B is index_merge tree (or vice versa)) @@ -65,12 +67,12 @@ (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND ... - (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND + (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) = (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND ... - (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) Now each line represents an index_merge.. } @@ -82,18 +84,18 @@ OR imergeB1 AND imergeB2 AND ... AND imergeBN - -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3 + -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-4 imergeA1 OR - imergeB1 AND imergeB2 AND ... AND imergeBN = + imergeB1 = - = (combine imergeA1 with each of the imergeB{i} ) = + = (combine imergeA1 with each of the range_treeB_1{i} ) = - combine(imergeA1 OR imergeB1) AND - combine(imergeA1 OR imergeB2) AND + combine(imergeA1 OR range_treeB_11) AND + combine(imergeA1 OR range_treeB_12) AND ... AND - combine(imergeA1 OR imergeBN) + combine(imergeA1 OR range_treeB_1N) } } @@ -109,7 +111,7 @@ DISCARD-IMERGE-2 step will cause index_merge option to be discarded when the WHERE clause has this form (conditions t.badkey may have abritrary form): - (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2) + (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2) DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are two indexes: @@ -123,6 +125,8 @@ The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. +(Here an imerge for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor +col3=c3 represent index ranges.) 2. New implementation -=-=(Guest - Mon, 20 Jul 2009, 17:13)=-=- Dependency deleted: 30 no longer depends on 24 -=-=(Guest - Sat, 20 Jun 2009, 09:34)=-=- Low Level Design modified. --- /tmp/wklog.24.old.21663 2009-06-20 09:34:48.000000000 +0300 +++ /tmp/wklog.24.new.21663 2009-06-20 09:34:48.000000000 +0300 @@ -4,6 +4,7 @@ 2. New implementation 2.1 New tree_and() 2.2 New tree_or() +3. Testing and required coverage </contents> 1. Current implementation overview @@ -240,3 +241,14 @@ In order to limit the impact of this combinatorial explosion, we will introduce a rule that we won't generate more than #defined MAX_IMERGE_OPTS options. + +3. Testing and required coverage +================================ +So far could find the following user cases: + +* BUG#17259: Query optimizer chooses wrong index +* BUG#17673: Optimizer does not use Index Merge optimization in some cases +* BUG#23322: Optimizer sometimes erroniously prefers other index over index merge +* BUG#30151: optimizer is very reluctant to chose index_merge algorithm + + -=-=(Guest - Thu, 18 Jun 2009, 16:55)=-=- Low Level Design modified. --- /tmp/wklog.24.old.19152 2009-06-18 16:55:00.000000000 +0300 +++ /tmp/wklog.24.new.19152 2009-06-18 16:55:00.000000000 +0300 @@ -141,13 +141,15 @@ Operations on SEL_ARG trees will be modified to produce/process the trees of this kind: + 2.1 New tree_and() ------------------ In order not to lose plans, we'll make these changes: -1. Don't remove index_merge part of the tree. +A1. Don't remove index_merge part of the tree (this will take care of + DISCARD-IMERGE-1 problem) -2. Push range conditions down into index_merge trees that may support them. +A2. Push range conditions down into index_merge trees that may support them. if one tree has range(key1) and the other tree has imerge(key1 OR key2) then perform an equvalent of this operation: @@ -155,8 +157,86 @@ (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) -3. Just as before: if both sel_tree A and sel_tree B have index_merge options, +A3. Just as before: if both sel_tree A and sel_tree B have index_merge options, concatenate them together. -2.2 New tree_or() +2.2 New tree_or() +----------------- +O1. Dont remove non-range plans: + Current tree_or() code will refuse to produce index_merge plans for + conditions like + + "t.key1part2=const OR t.key2part1=const" + + (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of + the AND condition is not usable for range access, and the operation of + tree_and() guaranteed that there was no way it could changed to make a + usable range plan. With new tree_and() and rule A2, this is no longer the + case. For example for this query: + + (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const + + it will construct a + + imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const) + + then tree_and() will apply rule A2 to push the range down into index merge + and after that we'll have: + + range(t.key1part1=const) + imerge( + t.key1part2=const AND t.key1part1=const, + t.key2part1=const + ) + note that imerge(...) describes a usable index_merge plan and it's possible + that it will be the best access path. + +O2. "Create index_merge accesses when possible" + Current tree_or() will not create index_merge access when it could create + non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems + in the current implementation" section). This will be changed to work as + follows: we will create index_merge made for index scans that didn't have + their match in the other sel_tree. + Ilustrating it with an example: + + | sel_tree_A | sel_tree_B | A or B | include in index_merge? + ------+------------+------------+--------+------------------------ + key1 | cond1 | cond2 | condM | no + key2 | cond3 | cond4 | NULL | no + key3 | cond5 | | | yes, A-side + key4 | cond6 | | | yes, A-side + key5 | | cond7 | | yes, B-side + key6 | | cond8 | | yes, B-side + + here we assume that + - (cond1 OR cond2) did produce a combined range. Not including them in + index_merge. + - (cond3 OR cond4) didn't produce a usable range (e.g. they were + t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them + didn't yield any range list) + - All other scand didn't have their counterparts, so we'll end up with a + SEL_TREE of: + + range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8)) + . + +O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is +that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven +seen any complaints that could be attributed to it. +If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to +lift it ,and produce a cross-product: + + ((key1p OR key2p) AND (key3p OR key4p)) + OR + ((key5p OR key6p) AND (key7p OR key8p)) + + = (key1p OR key2p OR key5p OR key6p) AND // this part is currently + (key3p OR key4p OR key5p OR key6p) AND // produced + + (key1p OR key2p OR key5p OR key6p) AND // this part will be added + (key3p OR key4p OR key5p OR key6p) //. + +In order to limit the impact of this combinatorial explosion, we will +introduce a rule that we won't generate more than #defined +MAX_IMERGE_OPTS options. -=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=- Low Level Design modified. --- /tmp/wklog.24.old.15612 2009-06-18 14:56:09.000000000 +0300 +++ /tmp/wklog.24.new.15612 2009-06-18 14:56:09.000000000 +0300 @@ -1 +1,162 @@ +<contents> +1. Current implementation overview +1.1. Problems in the current implementation +2. New implementation +2.1 New tree_and() +2.2 New tree_or() +</contents> + +1. Current implementation overview +================================== +At the moment, range analyzer works as follows: + +SEL_TREE structure represents + + # There are sel_trees, a sel_tree is either range or merge tree + sel_tree = range_tree | imerge_tree + + # a range tree has range access options, possibly for several keys + range_tree = range(key1) AND range(key2) AND ... AND range(keyN); + + # merge tree represents several way to index_merge + imerge_tree = imerge1 AND imerge2 AND ... + + # a way to do index merge == a set to use of different indexes. + imergeX = range_tree1 OR range_tree2 OR .. + where no pair of range_treeX have ranges over the same index. + + + tree_and(A, B) + { + if (both A and B are range trees) + return a range_tree with computed intersection for each range; + if (only one of A and B is a range tree) + return that tree; // DISCARD-IMERGE-1 + // at this point both trees are index_merge trees + return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN); + } + + + tree_or(A, B) + { + if (A and B are range trees) + { + R = new range_tree; + for each index i + R.add(range_union(A.range(i), B.range(i))); + + if (R has at least one range access) + return R; + else + { + /* could not build any range accesses. construct index_merge */ + remove non-ranges from A; // DISCARD-IMERGE-2 + remove non-ranges from B; + return new index_merge(A, B); + } + } + else if (A is range tree and B is index_merge tree (or vice versa)) + { + Perform this transformation: + + range_treeA // this is A + OR + (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND + (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND + ... + (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND + = + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND + ... + (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND + + Now each line represents an index_merge.. + } + else if (both A and B are index_merge trees) + { + Perform this transformation: + + imergeA1 AND imergeA2 AND ... AND imergeAN + OR + imergeB1 AND imergeB2 AND ... AND imergeBN + + -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-3 + + imergeA1 + OR + imergeB1 AND imergeB2 AND ... AND imergeBN = + + = (combine imergeA1 with each of the imergeB{i} ) = + + combine(imergeA1 OR imergeB1) AND + combine(imergeA1 OR imergeB2) AND + ... AND + combine(imergeA1 OR imergeBN) + } + } + +1.1. Problems in the current implementation +------------------------------------------- +As marked in the code above: + +DISCARD-IMERGE-1 step will cause index_merge option to be discarded when +the WHERE clause has this form: + + (t.key1=c1 OR t.key2=c2) AND t.badkey < c3 + +DISCARD-IMERGE-2 step will cause index_merge option to be discarded when +the WHERE clause has this form (conditions t.badkey may have abritrary form): + + (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2) + +DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are +two indexes: + + INDEX i1(col1, col2), + INDEX i2(col1, col3) + +and this WHERE clause: + + col1=c1 AND (col2=c2 OR col3=c3) + +The optimizer will generate the plans that only use the "col1=c1" part. The +right side of the AND will be ignored even if it has good selectivity. + + +2. New implementation +===================== + +<general idea> +* Don't start fighting combinatorial explosion until we've actually got one. +</> + +SEL_TREE structure will be now able to hold both index_merge and range scan +candidates at the same time. That is, + + sel_tree2 = range_tree AND imerge_tree + +where both parts are optional (i.e. can be empty) + +Operations on SEL_ARG trees will be modified to produce/process the trees of +this kind: + +2.1 New tree_and() +------------------ +In order not to lose plans, we'll make these changes: + +1. Don't remove index_merge part of the tree. + +2. Push range conditions down into index_merge trees that may support them. + if one tree has range(key1) and the other tree has imerge(key1 OR key2) + then perform an equvalent of this operation: + + rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) = + + (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) + +3. Just as before: if both sel_tree A and sel_tree B have index_merge options, + concatenate them together. + +2.2 New tree_or() -=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=- Dependency created: 30 now depends on 24 -=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.21580 2009-06-01 23:30:06.000000000 +0300 +++ /tmp/wklog.24.new.21580 2009-06-01 23:30:06.000000000 +0300 @@ -64,6 +64,9 @@ * How strict is the limitation on the form of the WHERE? +* Which version should this be based on? 5.1? Which patches are should be in + (google's/percona's/maria/etc?) + * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? -=-=(Guest - Wed, 27 May 2009, 13:59)=-=- Title modified. --- /tmp/wklog.24.old.9498 2009-05-27 13:59:23.000000000 +0300 +++ /tmp/wklog.24.new.9498 2009-05-27 13:59:23.000000000 +0300 @@ -1 +1 @@ -index_merge optimizer: dont discard index_merge union strategies when range is available +index_merge: fair choice between index_merge union and range access -=-=(Guest - Tue, 26 May 2009, 13:27)=-=- High-Level Specification modified. --- /tmp/wklog.24.old.305 2009-05-26 13:27:32.000000000 +0300 +++ /tmp/wklog.24.new.305 2009-05-26 13:27:32.000000000 +0300 @@ -1 +1,70 @@ +(Not a ready HLS but draft) +<contents> +Solution overview +Limitations +TODO + +</contents> + +Solution overview +================= +The idea is to delay discarding potential index_merge plans until the point +where it is really necessary. + +This way, we won't have to do much changes in the range analyzer, but will be +able to keep potential index_merge plan just enough so that it's possible to +take it into consideration together with range access plans. + +Since there are no changes in the optimizer, the ability to consider both +range and index_merge options will be limited to WHERE clauses of this form: + + WHERE := range_cond(key1_1) AND + range_cond(key2_1) AND + other_cond AND + index_merge_OR_cond1(key3_1, key3_2, ...) + index_merge_OR_cond2(key4_1, key4_2, ...) + +where + + index_merge_OR_cond{N} := (range_cond(keyN_1) OR + range_cond(keyN_2) OR ...) + + + range_cond(keyX) := condition that allows to construct range access of keyX + and doesn't allow to construct range/index_merge accesses + for any keys of the table in question. + + +For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: + + SEL_TREE( + range(key1_1), + ... + range(key2_1), + SEL_IMERGE( (1) + SEL_TREE(key3_1}) + SEL_TREE(key3_2}) + ... + ) + ... + ) + +which can be used to make a cost-based choice between range and index_merge. + +Limitations +----------- +This will not be a full solution in a sense that the range analyzer will not +be able to produce sel_tree (1) if the WHERE clause is specified in other form +(e.g. brackets were opened). + +TODO +---- +* is it a problem if there are keys that are referred to both from + index_merge and from range access? + +* How strict is the limitation on the form of the WHERE? + +* TODO: The optimizer didn't compare costs of index_merge and range before (ok + it did but that was done for accesses to different tables). Will there be any + possible gotchas here? DESCRIPTION: Current range optimizer will discard possible index_merge/[sort]union strategies when there is a possible range plan. This action is a part of measures we take to avoid combinatorial explosion of possible range/ index_merge strategies. A bad side effect of this is that for WHERE clauses in form t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2') the optimizer will - discard union(key2,key3) in favor of range(key1) - consider costs of using range(key1) and discard that plan also and the overall effect is that possible poor range access will cause possible good index_merge access not to be considered. This WL is to about lifting this limitation at least for some subset of WHERE clauses. HIGH-LEVEL SPECIFICATION: (Not a ready HLS but draft) <contents> Solution overview Limitations TODO </contents> Solution overview ================= The idea is to delay discarding potential index_merge plans until the point where it is really necessary. This way, we won't have to do much changes in the range analyzer, but will be able to keep potential index_merge plan just enough so that it's possible to take it into consideration together with range access plans. Since there are no changes in the optimizer, the ability to consider both range and index_merge options will be limited to WHERE clauses of this form: WHERE := range_cond(key1_1) AND range_cond(key2_1) AND other_cond AND index_merge_OR_cond1(key3_1, key3_2, ...) index_merge_OR_cond2(key4_1, key4_2, ...) where index_merge_OR_cond{N} := (range_cond(keyN_1) OR range_cond(keyN_2) OR ...) range_cond(keyX) := condition that allows to construct range access of keyX and doesn't allow to construct range/index_merge accesses for any keys of the table in question. For such WHERE clauses, the range analyzer will produce SEL_TREE of this form: SEL_TREE( range(key1_1), ... range(key2_1), SEL_IMERGE( (1) SEL_TREE(key3_1}) SEL_TREE(key3_2}) ... ) ... ) which can be used to make a cost-based choice between range and index_merge. Limitations ----------- This will not be a full solution in a sense that the range analyzer will not be able to produce sel_tree (1) if the WHERE clause is specified in other form (e.g. brackets were opened). TODO ---- * is it a problem if there are keys that are referred to both from index_merge and from range access? * How strict is the limitation on the form of the WHERE? * Which version should this be based on? 5.1? Which patches are should be in (google's/percona's/maria/etc?) * TODO: The optimizer didn't compare costs of index_merge and range before (ok it did but that was done for accesses to different tables). Will there be any possible gotchas here? LOW-LEVEL DESIGN: <contents> 1. Current implementation overview 1.1. Problems in the current implementation 2. New implementation 2.1 New tree_and() 2.2 New tree_or() 3. Testing and required coverage </contents> 1. Current implementation overview ================================== At the moment, range analyzer works as follows: SEL_TREE structure represents # There are sel_trees, a sel_tree is either range or merge tree sel_tree = range_tree | imerge_tree # a range tree has range access options, possibly for several keys range_tree = range(key1) AND range(key2) AND ... AND range(keyN); (here range(keyi) may represent ranges not for initial keyi prefixes, but ranges for any infixes for keyi) # merge tree represents several way to index_merge imerge_tree = imerge1 AND imerge2 AND ... # a way to do index merge == a set to use of different indexes. imergeX = range_tree1 OR range_tree2 OR .. where no pair of range_treeX have ranges over the same index. tree_and(A, B) { if (both A and B are range trees) return a range_tree with computed intersection for each range; if (only one of A and B is a range tree) return that tree; // DISCARD-IMERGE-1 // at this point both trees are index_merge trees return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN); } tree_or(A, B) { if (A and B are range trees) { R = new range_tree; for each index i R.add(range_union(A.range(i), B.range(i))); if (R has at least one range access) return R; // DISCARD-IMERGE-2 else { /* could not build any range accesses. construct index_merge */ remove non-ranges from A; remove non-ranges from B; return new index_merge(A, B); // DISCARD-IMERGE-3 } } else if (A is range tree and B is index_merge tree (or vice versa)) { Perform this transformation: range_treeA // this is A OR (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND ... (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) = (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND ... (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) Now each line represents an index_merge.. } else if (both A and B are index_merge trees) { Perform this transformation: imergeA1 AND imergeA2 AND ... AND imergeAN OR imergeB1 AND imergeB2 AND ... AND imergeBN -> (discard all imergeA{i=2,3,...} -> // DISCARD-IMERGE-4 imergeA1 OR imergeB1 = = (combine imergeA1 with each of the range_treeB_1{i} ) = combine(imergeA1 OR range_treeB_11) AND combine(imergeA1 OR range_treeB_12) AND ... AND combine(imergeA1 OR range_treeB_1N) } } 1.1. Problems in the current implementation ------------------------------------------- As marked in the code above: DISCARD-IMERGE-1 step will cause index_merge option to be discarded when the WHERE clause has this form: (t.key1=c1 OR t.key2=c2) AND t.badkey < c3 DISCARD-IMERGE-2 step will cause index_merge option to be discarded when the WHERE clause has this form (conditions t.badkey may have abritrary form): (t.badkey<c1 AND t.key1=c1) OR (t.key2=c2 AND t.badkey < c2) DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are two indexes: INDEX i1(col1, col2), INDEX i2(col1, col3) and this WHERE clause: col1=c1 AND (col2=c2 OR col3=c3) The optimizer will generate the plans that only use the "col1=c1" part. The right side of the AND will be ignored even if it has good selectivity. (Here an imerge for col2=c2 OR col3=c3 won't be built since neither col2=c2 nor col3=c3 represent index ranges.) 2. New implementation ===================== <general idea> * Don't start fighting combinatorial explosion until we've actually got one. </> SEL_TREE structure will be now able to hold both index_merge and range scan candidates at the same time. That is, sel_tree2 = range_tree AND imerge_tree where both parts are optional (i.e. can be empty) Operations on SEL_ARG trees will be modified to produce/process the trees of this kind: 2.1 New tree_and() ------------------ In order not to lose plans, we'll make these changes: A1. Don't remove index_merge part of the tree (this will take care of DISCARD-IMERGE-1 problem) A2. Push range conditions down into index_merge trees that may support them. if one tree has range(key1) and the other tree has imerge(key1 OR key2) then perform an equvalent of this operation: rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) = (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2)) A3. Just as before: if both sel_tree A and sel_tree B have index_merge options, concatenate them together. 2.2 New tree_or() ----------------- O1. Dont remove non-range plans: Current tree_or() code will refuse to produce index_merge plans for conditions like "t.key1part2=const OR t.key2part1=const" (this is marked as DISCARD-IMERGE-3). This was justifed as the left part of the AND condition is not usable for range access, and the operation of tree_and() guaranteed that there was no way it could changed to make a usable range plan. With new tree_and() and rule A2, this is no longer the case. For example for this query: (t.key1part2=const OR t.key2part1=const) AND t.key1part1=const it will construct a imerge(t.key1part2=const OR t.key2part1=const), range(t.key1part1=const) then tree_and() will apply rule A2 to push the range down into index merge and after that we'll have: range(t.key1part1=const) imerge( t.key1part2=const AND t.key1part1=const, t.key2part1=const ) note that imerge(...) describes a usable index_merge plan and it's possible that it will be the best access path. O2. "Create index_merge accesses when possible" Current tree_or() will not create index_merge access when it could create non-index merge access (see DISCARD-IMERGE-3 and its example in the "Problems in the current implementation" section). This will be changed to work as follows: we will create index_merge made for index scans that didn't have their match in the other sel_tree. Ilustrating it with an example: | sel_tree_A | sel_tree_B | A or B | include in index_merge? ------+------------+------------+--------+------------------------ key1 | cond1 | cond2 | condM | no key2 | cond3 | cond4 | NULL | no key3 | cond5 | | | yes, A-side key4 | cond6 | | | yes, A-side key5 | | cond7 | | yes, B-side key6 | | cond8 | | yes, B-side here we assume that - (cond1 OR cond2) did produce a combined range. Not including them in index_merge. - (cond3 OR cond4) didn't produce a usable range (e.g. they were t.key1part1=c1 AND t.key1part2=c1, respectively, and combining them didn't yield any range list) - All other scand didn't have their counterparts, so we'll end up with a SEL_TREE of: range(condM) AND index_merge((cond5 AND cond6),(cond7 AND cond8)) . O4. There is no O4. DISCARD-INDEX-MERGE-4 will remain there. The idea is that although DISCARD-INDEX-MERGE-4 does discard plans, so far we haven seen any complaints that could be attributed to it. If we face the need to lift DISCARD-INDEX-MERGE-4, our answer will be to lift it ,and produce a cross-product: ((key1p OR key2p) AND (key3p OR key4p)) OR ((key5p OR key6p) AND (key7p OR key8p)) = (key1p OR key2p OR key5p OR key6p) AND // this part is currently (key3p OR key4p OR key5p OR key6p) AND // produced (key1p OR key2p OR key5p OR key6p) AND // this part will be added (key3p OR key4p OR key5p OR key6p) //. In order to limit the impact of this combinatorial explosion, we will introduce a rule that we won't generate more than #defined MAX_IMERGE_OPTS options. 3. Testing and required coverage ================================ So far could find the following user cases: * BUG#17259: Query optimizer chooses wrong index * BUG#17673: Optimizer does not use Index Merge optimization in some cases * BUG#23322: Optimizer sometimes erroniously prefers other index over index merge * BUG#30151: optimizer is very reluctant to chose index_merge algorithm ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Store in binlog text of statements that caused RBR events CREATION DATE..: Sat, 15 Aug 2009, 23:48 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 47 (http://askmonty.org/worklog/?tid=47) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Store in binlog (and show in mysqlbinlog output) texts of statements that caused RBR events This is needed for (list from Monty): - Easier to understand why updates happened - Would make it easier to find out where in application things went wrong (as you can search for exact strings) - Allow one to filter things based on comments in the statement. The cost of this can be that the binlog will be approximately 2x in size (especially insert of big blob's would be a bit painful), so this should be an optional feature. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Store in binlog text of statements that caused RBR events (47)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Store in binlog text of statements that caused RBR events CREATION DATE..: Sat, 15 Aug 2009, 23:48 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 47 (http://askmonty.org/worklog/?tid=47) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Store in binlog (and show in mysqlbinlog output) texts of statements that caused RBR events This is needed for (list from Monty): - Easier to understand why updates happened - Would make it easier to find out where in application things went wrong (as you can search for exact strings) - Allow one to filter things based on comments in the statement. The cost of this can be that the binlog will be approximately 2x in size (especially insert of big blob's would be a bit painful), so this should be an optional feature. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * What is the need for WITH TIMESTAMP part? Can't one use a separate SET TIMESTAMP statement? * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sat, 15 Aug 2009, 23:43)=-=- High-Level Specification modified. --- /tmp/wklog.46.old.17742 2009-08-15 23:43:09.000000000 +0300 +++ /tmp/wklog.46.new.17742 2009-08-15 23:43:09.000000000 +0300 @@ -1 +1,28 @@ +Suggestion 1 +------------ +Original syntax suggestion by Kristian: + + BINLOG + WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 + TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 + TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 + WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 + UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), + FROM ('a') TO ('b') FLAGS 0x0 + DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; + + This is basically a dump of what is stored in the events, and would be an + alternative to BINLOG 'gwWEShMBAA...'. + +Feedback and other suggestions +------------------------------ +* What is the need for WITH TIMESTAMP part? Can't one use a separate + SET TIMESTAMP statement? + +* mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something + that's close to readable SQL. Can we make it to be regular parseable SQL? + + This will be syntax that's familiar to our parser and to the users + - A stream of SQL statements will be slower to run than BINLOG statements + (due to locking, table open/close, etc). (TODO: is it really slower? we + haven't checked). DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default HIGH-LEVEL SPECIFICATION: Suggestion 1 ------------ Original syntax suggestion by Kristian: BINLOG WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), FROM ('a') TO ('b') FLAGS 0x0 DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0; This is basically a dump of what is stored in the events, and would be an alternative to BINLOG 'gwWEShMBAA...'. Feedback and other suggestions ------------------------------ * What is the need for WITH TIMESTAMP part? Can't one use a separate SET TIMESTAMP statement? * mysqlbinlog --base64-output=DECODE-ROWS --verbose already produces something that's close to readable SQL. Can we make it to be regular parseable SQL? + This will be syntax that's familiar to our parser and to the users - A stream of SQL statements will be slower to run than BINLOG statements (due to locking, table open/close, etc). (TODO: is it really slower? we haven't checked). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Change BINLOG statement syntax to be human-readable (46)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Change BINLOG statement syntax to be human-readable CREATION DATE..: Sat, 15 Aug 2009, 23:42 SUPERVISOR.....: Monty IMPLEMENTOR....: Psergey COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 46 (http://askmonty.org/worklog/?tid=46) VERSION........: WorkLog-3.4 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: One of great things about mysqlbinlog was that its output was human-readable SQL, so it was possible to edit it manually or with help of scripts. With RBR events and BINLOG 'DpiGShMBAAAALQAAADcBAA...' statements this is no longer the case. This WL task is about making BINLOG statements to be human-readable (either as an option or by default ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to produce succint output (45)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to produce succint output CREATION DATE..: Sat, 15 Aug 2009, 23:40 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 45 (http://askmonty.org/worklog/?tid=45) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sat, 15 Aug 2009, 23:40)=-=- Title modified. --- /tmp/wklog.45.old.17603 2009-08-15 23:40:38.000000000 +0300 +++ /tmp/wklog.45.new.17603 2009-08-15 23:40:38.000000000 +0300 @@ -1 +1 @@ -Add a mysqlbinlog option to produce siccint output +Add a mysqlbinlog option to produce succint output DESCRIPTION: Add a mysqlbinlog option to produce the most succinct output, without any comments or other statements that are not needed to apply binlog correctly. This will be different from --short-form option. That option causes mysqlbinlog not to print RBR events, i.e. the output is not supposed to be applied. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Psergey): Add a mysqlbinlog option to produce succint output (45)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to produce succint output CREATION DATE..: Sat, 15 Aug 2009, 23:40 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 45 (http://askmonty.org/worklog/?tid=45) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Psergey - Sat, 15 Aug 2009, 23:40)=-=- Title modified. --- /tmp/wklog.45.old.17603 2009-08-15 23:40:38.000000000 +0300 +++ /tmp/wklog.45.new.17603 2009-08-15 23:40:38.000000000 +0300 @@ -1 +1 @@ -Add a mysqlbinlog option to produce siccint output +Add a mysqlbinlog option to produce succint output DESCRIPTION: Add a mysqlbinlog option to produce the most succinct output, without any comments or other statements that are not needed to apply binlog correctly. This will be different from --short-form option. That option causes mysqlbinlog not to print RBR events, i.e. the output is not supposed to be applied. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Add a mysqlbinlog option to produce siccint output (45)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to produce siccint output CREATION DATE..: Sat, 15 Aug 2009, 23:40 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 45 (http://askmonty.org/worklog/?tid=45) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Add a mysqlbinlog option to produce the most succinct output, without any comments or other statements that are not needed to apply binlog correctly. This will be different from --short-form option. That option causes mysqlbinlog not to print RBR events, i.e. the output is not supposed to be applied. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Psergey): Add a mysqlbinlog option to produce siccint output (45)
by worklog-noreply＠askmonty.org 15 Aug '09

15 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to produce siccint output CREATION DATE..: Sat, 15 Aug 2009, 23:40 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Server-RawIdeaBin TASK ID........: 45 (http://askmonty.org/worklog/?tid=45) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Add a mysqlbinlog option to produce the most succinct output, without any comments or other statements that are not needed to apply binlog correctly. This will be different from --short-form option. That option causes mysqlbinlog not to print RBR events, i.e. the output is not supposed to be applied. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Rev 2724: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 15 Aug '09

15 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2724 revision-id: psergey(a)askmonty.org-20090815153912-q47vfp1j22ilmup2 parent: psergey(a)askmonty.org-20090815121442-706m9ujn8km4u4y1 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sat 2009-08-15 18:39:12 +0300 message: MWL#17: Table elimination - Review feedback, more variable renames === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-15 12:14:42 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-15 15:39:12 +0000 @@ -114,7 +114,6 @@ MODULE_EXPRESSION, MODULE_MULTI_EQUALITY, MODULE_UNIQUE_KEY, - MODULE_TABLE, MODULE_OUTER_JOIN } type; /* Type of the object */ @@ -138,7 +137,7 @@ { public: Field_value *field; - Item *val; + Item *expression; /* Used during condition analysis only, similar to KEYUSE::level */ uint level; @@ -510,18 +509,18 @@ */ if (old->field == new_fields->field) { - if (!new_fields->val->const_item()) + if (!new_fields->expression->const_item()) { /* If the value matches, we can use the key reference. If not, we keep it until we have examined all new values */ - if (old->val->eq(new_fields->val, old->field->field->binary())) + if (old->expression->eq(new_fields->expression, old->field->field->binary())) { old->level= and_level; } } - else if (old->val->eq_by_collation(new_fields->val, + else if (old->expression->eq_by_collation(new_fields->expression, old->field->field->binary(), old->field->field->charset())) { @@ -633,7 +632,7 @@ /* Store possible eq field */ (*eq_dep)->type= Module_dep::MODULE_EXPRESSION; //psergey-todo; (*eq_dep)->field= get_field_value(te, field); - (*eq_dep)->val= *value; + (*eq_dep)->expression= *value; (*eq_dep)->level= and_level; (*eq_dep)++; } @@ -953,7 +952,7 @@ { deps_setter.expr_offset= eq_dep - te->equality_deps; eq_dep->unknown_args= 0; - eq_dep->val->walk(&Item::check_column_usage_processor, FALSE, + eq_dep->expression->walk(&Item::check_column_usage_processor, FALSE, (uchar*)&deps_setter); if (!eq_dep->unknown_args) { @@ -1283,7 +1282,7 @@ char buf[128]; String str(buf, sizeof(buf), &my_charset_bin); str.length(0); - eq_dep->val->print(&str, QT_ORDINARY); + eq_dep->expression->print(&str, QT_ORDINARY); fprintf(DBUG_FILE, " equality%d: %s -> %s.%s\n", eq_dep - te->equality_deps, str.c_ptr(),

1 0

[Maria-developers] Rev 2723: Fix trivial typo in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 15 Aug '09

15 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2723 revision-id: psergey(a)askmonty.org-20090815121442-706m9ujn8km4u4y1 parent: psergey(a)askmonty.org-20090815102953-7s0jb470ibwq58qz committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sat 2009-08-15 16:14:42 +0400 message: Fix trivial typo === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-15 10:29:53 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-15 12:14:42 +0000 @@ -841,7 +841,7 @@ *eliminable_tables); } - if (eliminable && get_outer_join_dep(te, tbl, cur_map)) + if (eliminable && !get_outer_join_dep(te, tbl, cur_map)) return TRUE; tables_used_on_left |= tbl->on_expr->used_tables();

1 0

[Maria-developers] Rev 2722: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 15 Aug '09

15 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2722 revision-id: psergey(a)askmonty.org-20090815102953-7s0jb470ibwq58qz parent: psergey(a)askmonty.org-20090815060803-0yvp5mmgo87emykp committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sat 2009-08-15 14:29:53 +0400 message: MWL#17: Table elimination Continue with addressing review feedback part two: - rename enum members - add checking for out of memory errors on allocation === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-15 06:08:03 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-15 10:29:53 +0000 @@ -111,11 +111,11 @@ { public: enum { - FD_EXPRESSION, - FD_MULTI_EQUALITY, - FD_UNIQUE_KEY, - FD_TABLE, - FD_OUTER_JOIN + MODULE_EXPRESSION, + MODULE_MULTI_EQUALITY, + MODULE_UNIQUE_KEY, + MODULE_TABLE, + MODULE_OUTER_JOIN } type; /* Type of the object */ /* @@ -156,7 +156,7 @@ Key_module(Table_value *table_arg, uint keyno_arg, uint n_parts_arg) : table(table_arg), keyno(keyno_arg), next_table_key(NULL) { - type= Module_dep::FD_UNIQUE_KEY; + type= Module_dep::MODULE_UNIQUE_KEY; unknown_args= n_parts_arg; } Table_value *table; /* Table this key is from */ @@ -178,7 +178,7 @@ Outer_join_module(TABLE_LIST *table_list_arg, uint n_children) : table_list(table_list_arg), parent(NULL) { - type= Module_dep::FD_OUTER_JOIN; + type= Module_dep::MODULE_OUTER_JOIN; unknown_args= n_children; } /* @@ -205,7 +205,7 @@ class Table_elimination { public: - Table_elimination(JOIN *join_arg) : join(join_arg) + Table_elimination(JOIN *join_arg) : join(join_arg), n_outer_joins(0) { bzero(table_deps, sizeof(table_deps)); } @@ -220,6 +220,7 @@ /* Outer joins that are candidates for elimination */ List<Outer_join_module> oj_deps; + uint n_outer_joins; /* Bitmap of how expressions depend on bits */ MY_BITMAP expr_deps; @@ -630,22 +631,25 @@ DBUG_ASSERT(eq_func); /* Store possible eq field */ - (*eq_dep)->type= Module_dep::FD_EXPRESSION; //psergey-todo; + (*eq_dep)->type= Module_dep::MODULE_EXPRESSION; //psergey-todo; (*eq_dep)->field= get_field_value(te, field); (*eq_dep)->val= *value; (*eq_dep)->level= and_level; (*eq_dep)++; } + /* Get a Table_value object for the given table, creating it if necessary. */ static Table_value *get_table_value(Table_elimination *te, TABLE *table) { - Table_value *tbl_dep= new Table_value(table); + Table_value *tbl_dep; + if (!(tbl_dep= new Table_value(table))) + return NULL; + Key_module **key_list= &(tbl_dep->keys); - /* Add dependencies for unique keys */ for (uint i=0; i < table->s->keys; i++) { @@ -657,7 +661,7 @@ key_list= &(key_dep->next_table_key); } } - return te->table_deps[table->tablenr] = tbl_dep; + return te->table_deps[table->tablenr]= tbl_dep; } @@ -672,7 +676,10 @@ /* First, get the table*/ if (!(tbl_dep= te->table_deps[table->tablenr])) - tbl_dep= get_table_value(te, table); + { + if (!(tbl_dep= get_table_value(te, table))) + return NULL; + } /* Try finding the field in field list */ Field_value **pfield= &(tbl_dep->fields); @@ -702,10 +709,12 @@ static Outer_join_module *get_outer_join_dep(Table_elimination *te, - TABLE_LIST *outer_join, table_map deps_map) + TABLE_LIST *outer_join, + table_map deps_map) { Outer_join_module *oj_dep; oj_dep= new Outer_join_module(outer_join, my_count_bits(deps_map)); + te->n_outer_joins++; /* Collect a bitmap fo tables that we depend on, and also set parent pointer @@ -734,7 +743,8 @@ } } DBUG_ASSERT(table); - table_dep= get_table_value(te, table); + if (!(table_dep= get_table_value(te, table))) + return NULL; } /* @@ -781,7 +791,7 @@ . */ -static void +static bool collect_funcdeps_for_join_list(Table_elimination *te, List<TABLE_LIST> *join_list, bool build_eq_deps, @@ -808,11 +818,12 @@ eliminable= !(cur_map & outside_used_tables); if (eliminable) *eliminable_tables |= cur_map; - collect_funcdeps_for_join_list(te, &tbl->nested_join->join_list, - eliminable || build_eq_deps, - outside_used_tables, - eliminable_tables, - eq_dep); + if (collect_funcdeps_for_join_list(te, &tbl->nested_join->join_list, + eliminable || build_eq_deps, + outside_used_tables, + eliminable_tables, + eq_dep)) + return TRUE; } else { @@ -830,13 +841,13 @@ *eliminable_tables); } - if (eliminable) - te->oj_deps.push_back(get_outer_join_dep(te, tbl, cur_map)); + if (eliminable && get_outer_join_dep(te, tbl, cur_map)) + return TRUE; tables_used_on_left |= tbl->on_expr->used_tables(); } } - return; + return FALSE; } @@ -1053,16 +1064,18 @@ DBUG_VOID_RETURN; Equality_module *eq_deps_end= te.equality_deps; table_map eliminable_tables= 0; - collect_funcdeps_for_join_list(&te, join->join_list, - FALSE, - used_tables, - &eliminable_tables, - &eq_deps_end); + if (collect_funcdeps_for_join_list(&te, join->join_list, + FALSE, + used_tables, + &eliminable_tables, + &eq_deps_end)) + DBUG_VOID_RETURN; te.n_equality_deps= eq_deps_end - te.equality_deps; Module_dep *bound_modules; //Value_dep *bound_values; - setup_equality_deps(&te, &bound_modules); + if (setup_equality_deps(&te, &bound_modules)) + DBUG_VOID_RETURN; run_elimination_wave(&te, bound_modules); } @@ -1108,7 +1121,7 @@ { switch (bound_modules->type) { - case Module_dep::FD_EXPRESSION: + case Module_dep::MODULE_EXPRESSION: { /* It's a field=expr and we got to know the expr, so we know the field */ Equality_module *eq_dep= (Equality_module*)bound_modules; @@ -1121,7 +1134,7 @@ } break; } - case Module_dep::FD_UNIQUE_KEY: + case Module_dep::MODULE_UNIQUE_KEY: { /* Unique key is known means the table is known */ Table_value *table_dep=((Key_module*)bound_modules)->table; @@ -1134,13 +1147,13 @@ } break; } - case Module_dep::FD_OUTER_JOIN: + case Module_dep::MODULE_OUTER_JOIN: { Outer_join_module *outer_join_dep= (Outer_join_module*)bound_modules; mark_as_eliminated(te->join, outer_join_dep->table_list); break; } - case Module_dep::FD_MULTI_EQUALITY: + case Module_dep::MODULE_MULTI_EQUALITY: default: DBUG_ASSERT(0); }

1 0

[Maria-developers] Rev 2721: MWL#17: Address 2nd post-review feedback in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 15 Aug '09

15 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2721 revision-id: psergey(a)askmonty.org-20090815060803-0yvp5mmgo87emykp parent: psergey(a)askmonty.org-20090813211212-jghejwxsl6adtopl committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Sat 2009-08-15 10:08:03 +0400 message: MWL#17: Address 2nd post-review feedback - Switch from uniform graph to bipartite graph with two kinds of nodes: "values" (tables and fields) and "modules" (t.col=func(...) equalities, multi-equalities, unique keys, inner sides of outer joins). - Rename functions, classes, etc. === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-13 20:44:52 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-15 06:08:03 +0000 @@ -40,19 +40,78 @@ Table elimination is redone on every PS re-execution. */ - -/* - An abstract structure that represents some entity that's being dependent on - some other entity. -*/ - -class Func_dep : public Sql_alloc -{ -public: - enum { - FD_INVALID, +class Value_dep +{ +public: + enum { + VALUE_FIELD, + VALUE_TABLE, + } type; /* Type of the object */ + + bool bound; + Value_dep *next; +}; + +class Field_value; +class Table_value; +class Outer_join_module; +class Key_module; + +/* + A table field. There is only one such object for any tblX.fieldY + - the field epends on its table and equalities + - expressions that use the field are its dependencies +*/ +class Field_value : public Value_dep +{ +public: + Field_value(Table_value *table_arg, Field *field_arg) : + table(table_arg), field(field_arg) + { + type= Value_dep::VALUE_FIELD; + } + + Table_value *table; /* Table this field is from */ + Field *field; + + /* + Field_deps that belong to one table form a linked list. list members are + ordered by field_index + */ + Field_value *next_table_field; + uint bitmap_offset; /* Offset of our part of the bitmap */ +}; + + +/* + A table. + - table depends on any of its unique keys + - has its fields and embedding outer join as dependency. +*/ +class Table_value : public Value_dep +{ +public: + Table_value(TABLE *table_arg) : + table(table_arg), fields(NULL), keys(NULL), outer_join_dep(NULL) + { + type= Value_dep::VALUE_TABLE; + } + TABLE *table; + Field_value *fields; /* Ordered list of fields that belong to this table */ + Key_module *keys; /* Ordered list of Unique keys in this table */ + Outer_join_module *outer_join_dep; /* Innermost eliminable outer join we're in */ +}; + + +/* + A 'module' +*/ + +class Module_dep : public Sql_alloc +{ +public: + enum { FD_EXPRESSION, - FD_FIELD, FD_MULTI_EQUALITY, FD_UNIQUE_KEY, FD_TABLE, @@ -63,58 +122,26 @@ Used to make a linked list of elements that became bound and thus can make elements that depend on them bound, too. */ - Func_dep *next; - bool bound; /* TRUE<=> The entity is considered bound */ - Func_dep() : next(NULL), bound(FALSE) {} + Module_dep *next; + uint unknown_args; /* TRUE<=> The entity is considered bound */ + + Module_dep() : next(NULL), unknown_args(0) {} }; -class Field_dep; -class Table_dep; -class Outer_join_dep; - /* A "tbl.column= expr" equality dependency. tbl.column depends on fields used in expr. */ -class Equality_dep : public Func_dep +class Equality_module : public Module_dep { public: - Field_dep *field; + Field_value *field; Item *val; /* Used during condition analysis only, similar to KEYUSE::level */ uint level; - - /* Number of fields referenced from *val that are not yet 'bound' */ - uint unknown_args; -}; - - -/* - A table field. There is only one such object for any tblX.fieldY - - the field epends on its table and equalities - - expressions that use the field are its dependencies -*/ -class Field_dep : public Func_dep -{ -public: - Field_dep(Table_dep *table_arg, Field *field_arg) : - table(table_arg), field(field_arg) - { - type= Func_dep::FD_FIELD; - } - - Table_dep *table; /* Table this field is from */ - Field *field; - - /* - Field_deps that belong to one table form a linked list. list members are - ordered by field_index - */ - Field_dep *next_table_field; - uint bitmap_offset; /* Offset of our part of the bitmap */ }; @@ -123,41 +150,21 @@ - Unique key depends on all of its components - Key's table is its dependency */ -class Key_dep: public Func_dep +class Key_module: public Module_dep { public: - Key_dep(Table_dep *table_arg, uint keyno_arg, uint n_parts_arg) : - table(table_arg), keyno(keyno_arg), n_missing_keyparts(n_parts_arg), - next_table_key(NULL) + Key_module(Table_value *table_arg, uint keyno_arg, uint n_parts_arg) : + table(table_arg), keyno(keyno_arg), next_table_key(NULL) { - type= Func_dep::FD_UNIQUE_KEY; + type= Module_dep::FD_UNIQUE_KEY; + unknown_args= n_parts_arg; } - Table_dep *table; /* Table this key is from */ + Table_value *table; /* Table this key is from */ uint keyno; - uint n_missing_keyparts; /* Unique keys form a linked list, ordered by keyno */ - Key_dep *next_table_key; -}; - - -/* - A table. - - table depends on any of its unique keys - - has its fields and embedding outer join as dependency. -*/ -class Table_dep : public Func_dep -{ -public: - Table_dep(TABLE *table_arg) : - table(table_arg), fields(NULL), keys(NULL), outer_join_dep(NULL) - { - type= Func_dep::FD_TABLE; - } - TABLE *table; - Field_dep *fields; /* Ordered list of fields that belong to this table */ - Key_dep *keys; /* Ordered list of Unique keys in this table */ - Outer_join_dep *outer_join_dep; /* Innermost eliminable outer join we're in */ -}; + Key_module *next_table_key; +}; + /* @@ -165,14 +172,14 @@ - it depends on all tables inside it - has its parent outer join as dependency */ -class Outer_join_dep: public Func_dep +class Outer_join_module: public Module_dep { public: - Outer_join_dep(TABLE_LIST *table_list_arg, table_map missing_tables_arg) : - table_list(table_list_arg), missing_tables(missing_tables_arg), - all_tables(missing_tables_arg), parent(NULL) + Outer_join_module(TABLE_LIST *table_list_arg, uint n_children) : + table_list(table_list_arg), parent(NULL) { - type= Func_dep::FD_OUTER_JOIN; + type= Module_dep::FD_OUTER_JOIN; + unknown_args= n_children; } /* Outer join we're representing. This can be a join nest or a one table that @@ -184,11 +191,11 @@ Tables within this outer join (and its descendants) that are not yet known to be functionally dependent. */ - table_map missing_tables; + table_map missing_tables; //psergey-todo: remove /* All tables within this outer join and its descendants */ - table_map all_tables; + table_map all_tables; //psergey-todo: remove /* Parent eliminable outer join, if any */ - Outer_join_dep *parent; + Outer_join_module *parent; }; @@ -205,44 +212,45 @@ JOIN *join; /* Array of equality dependencies */ - Equality_dep *equality_deps; + Equality_module *equality_deps; uint n_equality_deps; /* Number of elements in the array */ - /* tablenr -> Table_dep* mapping. */ - Table_dep *table_deps[MAX_KEY]; + /* tablenr -> Table_value* mapping. */ + Table_value *table_deps[MAX_KEY]; /* Outer joins that are candidates for elimination */ - List<Outer_join_dep> oj_deps; + List<Outer_join_module> oj_deps; /* Bitmap of how expressions depend on bits */ MY_BITMAP expr_deps; }; - static -void build_eq_deps_for_cond(Table_elimination *te, Equality_dep **fdeps, +void build_eq_deps_for_cond(Table_elimination *te, Equality_module **fdeps, uint *and_level, Item *cond, table_map usable_tables); static void add_eq_dep(Table_elimination *te, - Equality_dep **eq_dep, uint and_level, + Equality_module **eq_dep, uint and_level, Item_func *cond, Field *field, bool eq_func, Item **value, uint num_values, table_map usable_tables); static -Equality_dep *merge_func_deps(Equality_dep *start, Equality_dep *new_fields, - Equality_dep *end, uint and_level); - -static Table_dep *get_table_dep(Table_elimination *te, TABLE *table); -static Field_dep *get_field_dep(Table_elimination *te, Field *field); - +Equality_module *merge_func_deps(Equality_module *start, Equality_module *new_fields, + Equality_module *end, uint and_level); + +static Table_value *get_table_value(Table_elimination *te, TABLE *table); +static Field_value *get_field_value(Table_elimination *te, Field *field); +static +void run_elimination_wave(Table_elimination *te, Module_dep *bound_modules); void eliminate_tables(JOIN *join); static void mark_as_eliminated(JOIN *join, TABLE_LIST *tbl); +#if 0 #ifndef DBUG_OFF static void dbug_print_deps(Table_elimination *te); #endif - +#endif /*******************************************************************************************/ /* @@ -262,14 +270,14 @@ */ static -void build_eq_deps_for_cond(Table_elimination *te, Equality_dep **fdeps, +void build_eq_deps_for_cond(Table_elimination *te, Equality_module **fdeps, uint *and_level, Item *cond, table_map usable_tables) { if (cond->type() == Item_func::COND_ITEM) { List_iterator_fast<Item> li(*((Item_cond*) cond)->argument_list()); - Equality_dep *org_key_fields= *fdeps; + Equality_module *org_key_fields= *fdeps; /* AND/OR */ if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC) @@ -293,7 +301,7 @@ Item *item; while ((item=li++)) { - Equality_dep *start_key_fields= *fdeps; + Equality_module *start_key_fields= *fdeps; (*and_level)++; build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables); *fdeps= merge_func_deps(org_key_fields, start_key_fields, *fdeps, @@ -432,7 +440,7 @@ /* - Perform an OR operation on two (adjacent) Equality_dep arrays. + Perform an OR operation on two (adjacent) Equality_module arrays. SYNOPSIS merge_func_deps() @@ -442,7 +450,7 @@ and_level AND-level. DESCRIPTION - This function is invoked for two adjacent arrays of Equality_dep elements: + This function is invoked for two adjacent arrays of Equality_module elements: $LEFT_PART $RIGHT_PART +-----------------------+-----------------------+ @@ -477,19 +485,19 @@ */ static -Equality_dep *merge_func_deps(Equality_dep *start, Equality_dep *new_fields, - Equality_dep *end, uint and_level) +Equality_module *merge_func_deps(Equality_module *start, Equality_module *new_fields, + Equality_module *end, uint and_level) { if (start == new_fields) return start; // Impossible or if (new_fields == end) return start; // No new fields, skip all - Equality_dep *first_free=new_fields; + Equality_module *first_free=new_fields; for (; new_fields != end ; new_fields++) { - for (Equality_dep *old=start ; old != first_free ; old++) + for (Equality_module *old=start ; old != first_free ; old++) { /* TODO: does it make sense to attempt to merging multiple-equalities? @@ -534,7 +542,7 @@ Ok, the results are within the [start, first_free) range, and the useful elements have level==and_level. Now, lets remove all unusable elements: */ - for (Equality_dep *old=start ; old != first_free ;) + for (Equality_module *old=start ; old != first_free ;) { if (old->level != and_level) { // Not used in all levels @@ -550,14 +558,14 @@ /* - Add an Equality_dep element for a given predicate, if applicable + Add an Equality_module element for a given predicate, if applicable DESCRIPTION This function is modeled after add_key_field(). */ static -void add_eq_dep(Table_elimination *te, Equality_dep **eq_dep, +void add_eq_dep(Table_elimination *te, Equality_module **eq_dep, uint and_level, Item_func *cond, Field *field, bool eq_func, Item **value, uint num_values, table_map usable_tables) @@ -622,22 +630,21 @@ DBUG_ASSERT(eq_func); /* Store possible eq field */ - (*eq_dep)->type= Func_dep::FD_EXPRESSION; //psergey-todo; - (*eq_dep)->field= get_field_dep(te, field); + (*eq_dep)->type= Module_dep::FD_EXPRESSION; //psergey-todo; + (*eq_dep)->field= get_field_value(te, field); (*eq_dep)->val= *value; (*eq_dep)->level= and_level; (*eq_dep)++; } - /* - Get a Table_dep object for the given table, creating it if necessary. + Get a Table_value object for the given table, creating it if necessary. */ -static Table_dep *get_table_dep(Table_elimination *te, TABLE *table) +static Table_value *get_table_value(Table_elimination *te, TABLE *table) { - Table_dep *tbl_dep= new Table_dep(table); - Key_dep **key_list= &(tbl_dep->keys); + Table_value *tbl_dep= new Table_value(table); + Key_module **key_list= &(tbl_dep->keys); /* Add dependencies for unique keys */ for (uint i=0; i < table->s->keys; i++) @@ -645,7 +652,7 @@ KEY *key= table->key_info + i; if ((key->flags & (HA_NOSAME | HA_END_SPACE_KEY)) == HA_NOSAME) { - Key_dep *key_dep= new Key_dep(tbl_dep, i, key->key_parts); + Key_module *key_dep= new Key_module(tbl_dep, i, key->key_parts); *key_list= key_dep; key_list= &(key_dep->next_table_key); } @@ -655,20 +662,20 @@ /* - Get a Field_dep object for the given field, creating it if necessary + Get a Field_value object for the given field, creating it if necessary */ -static Field_dep *get_field_dep(Table_elimination *te, Field *field) +static Field_value *get_field_value(Table_elimination *te, Field *field) { TABLE *table= field->table; - Table_dep *tbl_dep; + Table_value *tbl_dep; /* First, get the table*/ if (!(tbl_dep= te->table_deps[table->tablenr])) - tbl_dep= get_table_dep(te, table); + tbl_dep= get_table_value(te, table); /* Try finding the field in field list */ - Field_dep **pfield= &(tbl_dep->fields); + Field_value **pfield= &(tbl_dep->fields); while (*pfield && (*pfield)->field->field_index < field->field_index) { pfield= &((*pfield)->next_table_field); @@ -677,7 +684,7 @@ return *pfield; /* Create the field and insert it in the list */ - Field_dep *new_field= new Field_dep(tbl_dep, field); + Field_value *new_field= new Field_value(tbl_dep, field); new_field->next_table_field= *pfield; *pfield= new_field; @@ -686,19 +693,19 @@ /* - Create an Outer_join_dep object for the given outer join + Create an Outer_join_module object for the given outer join DESCRIPTION - Outer_join_dep objects for children (or further descendants) are always + Outer_join_module objects for children (or further descendants) are always created before the parents. */ static -Outer_join_dep *get_outer_join_dep(Table_elimination *te, +Outer_join_module *get_outer_join_dep(Table_elimination *te, TABLE_LIST *outer_join, table_map deps_map) { - Outer_join_dep *oj_dep; - oj_dep= new Outer_join_dep(outer_join, deps_map); + Outer_join_module *oj_dep; + oj_dep= new Outer_join_module(outer_join, my_count_bits(deps_map)); /* Collect a bitmap fo tables that we depend on, and also set parent pointer @@ -708,7 +715,7 @@ int idx; while ((idx= it.next_bit()) != Table_map_iterator::BITMAP_END) { - Table_dep *table_dep; + Table_value *table_dep; if (!(table_dep= te->table_deps[idx])) { /* @@ -727,23 +734,24 @@ } } DBUG_ASSERT(table); - table_dep= get_table_dep(te, table); + table_dep= get_table_value(te, table); } /* Walk from the table up to its embedding outer joins. The goal is to find the least embedded outer join nest and set its parent pointer to - point to the newly created Outer_join_dep. + point to the newly created Outer_join_module. to set the pointer of its near */ if (!table_dep->outer_join_dep) table_dep->outer_join_dep= oj_dep; else { - Outer_join_dep *oj= table_dep->outer_join_dep; + Outer_join_module *oj= table_dep->outer_join_dep; while (oj->parent) oj= oj->parent; - oj->parent=oj_dep; + if (oj != oj_dep) + oj->parent=oj_dep; } } return oj_dep; @@ -757,7 +765,7 @@ collect_funcdeps_for_join_list() te Table elimination context. join_list Join list to work on - build_eq_deps TRUE <=> build Equality_dep elements for all + build_eq_deps TRUE <=> build Equality_module elements for all members of the join list, even if they cannot be individually eliminated tables_used_elsewhere Bitmap of tables that are referred to from @@ -779,7 +787,7 @@ bool build_eq_deps, table_map tables_used_elsewhere, table_map *eliminable_tables, - Equality_dep **eq_dep) + Equality_module **eq_dep) { TABLE_LIST *tbl; List_iterator<TABLE_LIST> it(*join_list); @@ -845,10 +853,10 @@ void see_field(Field *field) { - Table_dep *tbl_dep; + Table_value *tbl_dep; if ((tbl_dep= te->table_deps[field->table->tablenr])) { - for (Field_dep *field_dep= tbl_dep->fields; field_dep; + for (Field_value *field_dep= tbl_dep->fields; field_dep; field_dep= field_dep->next_table_field) { if (field->field_index == field_dep->field->field_index) @@ -888,21 +896,21 @@ */ static -bool setup_equality_deps(Table_elimination *te, Func_dep **bound_deps_list) +bool setup_equality_deps(Table_elimination *te, Module_dep **bound_deps_list) { DBUG_ENTER("setup_equality_deps"); /* - Count Field_dep objects and assign each of them a unique bitmap_offset. + Count Field_value objects and assign each of them a unique bitmap_offset. */ uint offset= 0; - for (Table_dep **tbl_dep=te->table_deps; + for (Table_value **tbl_dep=te->table_deps; tbl_dep < te->table_deps + MAX_TABLES; tbl_dep++) { if (*tbl_dep) { - for (Field_dep *field_dep= (*tbl_dep)->fields; + for (Field_value *field_dep= (*tbl_dep)->fields; field_dep; field_dep= field_dep->next_table_field) { @@ -926,9 +934,9 @@ Also collect a linked list of equalities that are bound. */ - Func_dep *bound_dep= NULL; + Module_dep *bound_dep= NULL; Field_dependency_setter deps_setter(te); - for (Equality_dep *eq_dep= te->equality_deps; + for (Equality_module *eq_dep= te->equality_deps; eq_dep < te->equality_deps + te->n_equality_deps; eq_dep++) { @@ -940,12 +948,11 @@ { eq_dep->next= bound_dep; bound_dep= eq_dep; - eq_dep->bound= TRUE; } } *bound_deps_list= bound_dep; - DBUG_EXECUTE("test", dbug_print_deps(te); ); + //DBUG_EXECUTE("test", dbug_print_deps(te); ); DBUG_RETURN(FALSE); } @@ -1042,9 +1049,9 @@ uint m= max(thd->lex->current_select->max_equal_elems,1); uint max_elems= ((thd->lex->current_select->cond_count+1)*2 + thd->lex->current_select->between_count)*m + 1 + 10; - if (!(te.equality_deps= new Equality_dep[max_elems])) + if (!(te.equality_deps= new Equality_module[max_elems])) DBUG_VOID_RETURN; - Equality_dep *eq_deps_end= te.equality_deps; + Equality_module *eq_deps_end= te.equality_deps; table_map eliminable_tables= 0; collect_funcdeps_for_join_list(&te, join->join_list, FALSE, @@ -1052,96 +1059,125 @@ &eliminable_tables, &eq_deps_end); te.n_equality_deps= eq_deps_end - te.equality_deps; - Func_dep *bound_dep; - setup_equality_deps(&te, &bound_dep); - - /* - Run the wave. - All Func_dep-derived objects are divided into three classes: - - Those that have bound=FALSE - - Those that have bound=TRUE - - Those that have bound=TRUE and are in the list.. - - */ - while (bound_dep) - { - Func_dep *next= bound_dep->next; - //e= list.remove_first(); - switch (bound_dep->type) + + Module_dep *bound_modules; + //Value_dep *bound_values; + setup_equality_deps(&te, &bound_modules); + + run_elimination_wave(&te, bound_modules); + } + DBUG_VOID_RETURN; +} + + +static +void signal_from_field_to_exprs(Table_elimination* te, Field_value *field_dep, + Module_dep **bound_modules) +{ + /* Now, expressions */ + for (uint i=0; i < te->n_equality_deps; i++) + { + if (bitmap_is_set(&te->expr_deps, field_dep->bitmap_offset + i) && + te->equality_deps[i].unknown_args && + !--te->equality_deps[i].unknown_args) + { + /* Mark as bound and add to the list */ + Equality_module* eq_dep= &te->equality_deps[i]; + eq_dep->next= *bound_modules; + *bound_modules= eq_dep; + } + } +} + + +static +void run_elimination_wave(Table_elimination *te, Module_dep *bound_modules) +{ + Value_dep *bound_values= NULL; + /* + Run the wave. + All Func_dep-derived objects are divided into three classes: + - Those that have bound=FALSE + - Those that have bound=TRUE + - Those that have bound=TRUE and are in the list.. + + */ + while (bound_modules) + { + for (;bound_modules; bound_modules= bound_modules->next) + { + switch (bound_modules->type) { - case Func_dep::FD_EXPRESSION: + case Module_dep::FD_EXPRESSION: { /* It's a field=expr and we got to know the expr, so we know the field */ - Equality_dep *eq_dep= (Equality_dep*)bound_dep; + Equality_module *eq_dep= (Equality_module*)bound_modules; if (!eq_dep->field->bound) { /* Mark as bound and add to the list */ eq_dep->field->bound= TRUE; - eq_dep->field->next= next; - next= eq_dep->field; - } - break; - } - case Func_dep::FD_FIELD: + eq_dep->field->next= bound_values; + bound_values= eq_dep->field; + } + break; + } + case Module_dep::FD_UNIQUE_KEY: + { + /* Unique key is known means the table is known */ + Table_value *table_dep=((Key_module*)bound_modules)->table; + if (!table_dep->bound) + { + /* Mark as bound and add to the list */ + table_dep->bound= TRUE; + table_dep->next= bound_values; + bound_values= table_dep; + } + break; + } + case Module_dep::FD_OUTER_JOIN: + { + Outer_join_module *outer_join_dep= (Outer_join_module*)bound_modules; + mark_as_eliminated(te->join, outer_join_dep->table_list); + break; + } + case Module_dep::FD_MULTI_EQUALITY: + default: + DBUG_ASSERT(0); + } + } + + for (;bound_values; bound_values=bound_values->next) + { + switch (bound_values->type) + { + case Value_dep::VALUE_FIELD: { /* Field became known. Check out - unique keys we belong to - expressions that depend on us. */ - Field_dep *field_dep= (Field_dep*)bound_dep; - for (Key_dep *key_dep= field_dep->table->keys; key_dep; + Field_value *field_dep= (Field_value*)bound_values; + for (Key_module *key_dep= field_dep->table->keys; key_dep; key_dep= key_dep->next_table_key) { DBUG_PRINT("info", ("key %s.%s is now bound", key_dep->table->table->alias, key_dep->table->table->key_info[key_dep->keyno].name)); if (field_dep->field->part_of_key.is_set(key_dep->keyno) && - !key_dep->bound) - { - if (!--key_dep->n_missing_keyparts) - { - /* Mark as bound and add to the list */ - key_dep->bound= TRUE; - key_dep->next= next; - next= key_dep; - } - } - } - - /* Now, expressions */ - for (uint i=0; i < te.n_equality_deps; i++) - { - if (bitmap_is_set(&te.expr_deps, field_dep->bitmap_offset + i)) - { - Equality_dep* eq_dep= &te.equality_deps[i]; - if (!--eq_dep->unknown_args) - { - /* Mark as bound and add to the list */ - eq_dep->bound= TRUE; - eq_dep->next= next; - next= eq_dep; - } - } - } - break; - } - case Func_dep::FD_UNIQUE_KEY: - { - /* Unique key is known means the table is known */ - Table_dep *table_dep=((Key_dep*)bound_dep)->table; - if (!table_dep->bound) - { - /* Mark as bound and add to the list */ - table_dep->bound= TRUE; - table_dep->next= next; - next= table_dep; - } - break; - } - case Func_dep::FD_TABLE: - { - Table_dep *table_dep=(Table_dep*)bound_dep; + key_dep->unknown_args && !--key_dep->unknown_args) + { + /* Mark as bound and add to the list */ + key_dep->next= bound_modules; + bound_modules= key_dep; + } + } + signal_from_field_to_exprs(te, field_dep, &bound_modules); + break; + } + case Value_dep::VALUE_TABLE: + { + Table_value *table_dep=(Table_value*)bound_values; DBUG_PRINT("info", ("table %s is now bound", table_dep->table->alias)); /* @@ -1149,50 +1185,35 @@ - all its fields are known - one more element in outer join nest is known */ - for (Field_dep *field_dep= table_dep->fields; field_dep; + for (Field_value *field_dep= table_dep->fields; field_dep; field_dep= field_dep->next_table_field) { if (!field_dep->bound) { /* Mark as bound and add to the list */ field_dep->bound= TRUE; - field_dep->next= next; - next= field_dep; - } - } - Outer_join_dep *outer_join_dep= table_dep->outer_join_dep; - if (!(outer_join_dep->missing_tables &= ~table_dep->table->map)) - { - /* Mark as bound and add to the list */ - outer_join_dep->bound= TRUE; - outer_join_dep->next= next; - next= outer_join_dep; - } - break; - } - case Func_dep::FD_OUTER_JOIN: - { - Outer_join_dep *outer_join_dep= (Outer_join_dep*)bound_dep; - mark_as_eliminated(te.join, outer_join_dep->table_list); - Outer_join_dep *parent= outer_join_dep->parent; - if (parent && - !(parent->missing_tables &= ~outer_join_dep->all_tables)) - { - /* Mark as bound and add to the list */ - parent->bound= TRUE; - parent->next= next; - next= parent; - } - break; - } - case Func_dep::FD_MULTI_EQUALITY: - default: + signal_from_field_to_exprs(te, field_dep, &bound_modules); + } + } + for (Outer_join_module *outer_join_dep= table_dep->outer_join_dep; + outer_join_dep; outer_join_dep= outer_join_dep->parent) + { + //if (!(outer_join_dep->missing_tables &= ~table_dep->table->map)) + if (outer_join_dep->unknown_args && + !--outer_join_dep->unknown_args) + { + /* Mark as bound and add to the list */ + outer_join_dep->next= bound_modules; + bound_modules= outer_join_dep; + } + } + break; + } + default: DBUG_ASSERT(0); } - bound_dep= next; } } - DBUG_VOID_RETURN; } @@ -1232,7 +1253,7 @@ } - +#if 0 #ifndef DBUG_OFF static void dbug_print_deps(Table_elimination *te) @@ -1243,7 +1264,7 @@ fprintf(DBUG_FILE,"deps {\n"); /* Start with printing equalities */ - for (Equality_dep *eq_dep= te->equality_deps; + for (Equality_module *eq_dep= te->equality_deps; eq_dep != te->equality_deps + te->n_equality_deps; eq_dep++) { char buf[128]; @@ -1261,13 +1282,13 @@ /* Then tables and their fields */ for (uint i=0; i < MAX_TABLES; i++) { - Table_dep *table_dep; + Table_value *table_dep; if ((table_dep= te->table_deps[i])) { /* Print table */ fprintf(DBUG_FILE, " table %s\n", table_dep->table->alias); /* Print fields */ - for (Field_dep *field_dep= table_dep->fields; field_dep; + for (Field_value *field_dep= table_dep->fields; field_dep; field_dep= field_dep->next_table_field) { fprintf(DBUG_FILE, " field %s.%s ->", table_dep->table->alias, @@ -1288,7 +1309,7 @@ } #endif - +#endif /** @} (end of group Table_Elimination) */

1 0

[Maria-developers] Updated (by Guest): improving mysqlbinlog output and doing rename (39)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: improving mysqlbinlog output and doing rename CREATION DATE..: Sun, 09 Aug 2009, 12:24 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-RawIdeaBin TASK ID........: 39 (http://askmonty.org/worklog/?tid=39) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 17 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=- Title modified. --- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300 +++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300 @@ -1 +1 @@ -Replication tasks +improving mysqlbinlog output and doing rename -=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=- Adding 1 hour for Monty's initial work on starting the architecture review. Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour). -=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=- Re-searched and added subtasks. Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours). -=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=- Dependency created: 39 now depends on 41 -=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=- Dependency created: 39 now depends on 40 -=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=- Dependency created: 39 now depends on 36 -=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=- Dependency created: 39 now depends on 38 -=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=- Dependency created: 39 now depends on 37 DESCRIPTION: A combine task for all replication tasks. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): improving mysqlbinlog output and doing rename (39)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: improving mysqlbinlog output and doing rename CREATION DATE..: Sun, 09 Aug 2009, 12:24 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-RawIdeaBin TASK ID........: 39 (http://askmonty.org/worklog/?tid=39) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 17 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 15:52)=-=- Title modified. --- /tmp/wklog.39.old.11123 2009-08-14 15:52:29.000000000 +0300 +++ /tmp/wklog.39.new.11123 2009-08-14 15:52:29.000000000 +0300 @@ -1 +1 @@ -Replication tasks +improving mysqlbinlog output and doing rename -=-=(Guest - Mon, 10 Aug 2009, 16:32)=-=- Adding 1 hour for Monty's initial work on starting the architecture review. Worked 1 hour and estimate 0 hours remain (original estimate increased by 1 hour). -=-=(Psergey - Mon, 10 Aug 2009, 15:59)=-=- Re-searched and added subtasks. Worked 16 hours and estimate 0 hours remain (original estimate increased by 16 hours). -=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=- Dependency created: 39 now depends on 41 -=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=- Dependency created: 39 now depends on 40 -=-=(Psergey - Sun, 09 Aug 2009, 12:27)=-=- Dependency created: 39 now depends on 36 -=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=- Dependency created: 39 now depends on 38 -=-=(Psergey - Sun, 09 Aug 2009, 12:24)=-=- Dependency created: 39 now depends on 37 DESCRIPTION: A combine task for all replication tasks. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter updates to certain tables CREATION DATE..: Mon, 10 Aug 2009, 13:25 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 40 (http://askmonty.org/worklog/?tid=40) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Fri, 14 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.10896 2009-08-14 15:47:39.000000000 +0300 +++ /tmp/wklog.40.new.10896 2009-08-14 15:47:39.000000000 +0300 @@ -72,3 +72,21 @@ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. + +2.4 Implement server functionality to ignore certain tables +----------------------------------------------------------- + +We could add a general facility in the server to ignore certain tables: + + SET SESSION ignored_tables = "db1.t1,db2.t2"; + +This would work similar to --replicate-ignore-table, but in a general way not +restricted to the slave SQL thread. + +It would then be trivial for mysqlbinlog to add such statements at the start +of the output, or probably the user could just do it manually with no need for +additional options for mysqlbinlog. + +It might be useful to integrate this with the code that already handles +--replicate-ignore-db and similar slave options. + -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.12989 2009-08-10 15:41:23.000000000 +0300 +++ /tmp/wklog.40.new.12989 2009-08-10 15:41:23.000000000 +0300 @@ -1,6 +1,7 @@ - 1. Context ---------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl -=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=- Dependency created: 39 now depends on 40 -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High Level Description modified. --- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300 +++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300 @@ -1,3 +1,4 @@ Replication slave can be set to filter updates to certain tables with ---replicate-[wild-]{do,ignore}-table options. This task is about adding similar -functionality to mysqlbinlog. +--replicate-[wild-]{do,ignore}-table options. + +This task is about adding similar functionality to mysqlbinlog. -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300 +++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300 @@ -1 +1,73 @@ +1. Context +---------- +At the moment, the server has these replication slave options: + + --replicate-do-table=db.tbl + --replicate-ignore-table=db.tbl + --replicate-wild-do-table=pattern.pattern + --replicate-wild-ignore-table=pattern.pattern + +They affect both RBR and SBR events. SBR events are checked after the +statement has been parsed, the server iterates over list of used tables and +checks them againist --replicate instructions. + +What is interesting is that this scheme still allows to update the ignored +table through a VIEW. + +2. Table filtering in mysqlbinlog +--------------------------------- + +Per-table filtering of RBR events is easy (as it is relatively easy to extract +the name of the table that the event applies to). + +Per-table filtering of SBR events is hard, as generally it is not apparent +which tables the statement refers to. + +This opens possible options: + +2.1 Put the parser into mysqlbinlog +----------------------------------- +Once we have a full parser in mysqlbinlog, we'll be able to check which tables +are used by a statement, and will allow to show behaviour identical to those +that one obtains when using --replicate-* slave options. + +(It is not clear how much effort is needed to put the parser into mysqlbinlog. +Any guesses?) + + +2.2 Use dumb regexp match +------------------------- +Use a really dumb approach. A query is considered to be modifying table X if +it matches an expression + +CREATE TABLE $tablename +DROP $tablename +UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' +DELETE ...$tablename ... WHERE // same as above +ALTER TABLE $tablename +.. etc (go get from the grammar) .. + +The advantage over doing the same in awk is that mysqlbinlog will also process +RBR statements, and together with that will provide a working solution for +those who are careful with their table names not mixing with string constants +and such. + +(TODO: string constants are of particular concern as they come from +[potentially hostile] users, unlike e.g. table aliases which come from +[not hostile] developers. Remove also all string constants before attempting +to do match?) + +2.3 Have the master put annotations +----------------------------------- +We could add a master option so that it injects into query a mark that tells +which tables the query will affect, e.g. for the query + + UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... + + +the binlog will have + + /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... + +and further processing in mysqlbinlog will be trivial. DESCRIPTION: Replication slave can be set to filter updates to certain tables with --replicate-[wild-]{do,ignore}-table options. This task is about adding similar functionality to mysqlbinlog. HIGH-LEVEL SPECIFICATION: 1. Context ---------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl --replicate-ignore-table=db.tbl --replicate-wild-do-table=pattern.pattern --replicate-wild-ignore-table=pattern.pattern They affect both RBR and SBR events. SBR events are checked after the statement has been parsed, the server iterates over list of used tables and checks them againist --replicate instructions. What is interesting is that this scheme still allows to update the ignored table through a VIEW. 2. Table filtering in mysqlbinlog --------------------------------- Per-table filtering of RBR events is easy (as it is relatively easy to extract the name of the table that the event applies to). Per-table filtering of SBR events is hard, as generally it is not apparent which tables the statement refers to. This opens possible options: 2.1 Put the parser into mysqlbinlog ----------------------------------- Once we have a full parser in mysqlbinlog, we'll be able to check which tables are used by a statement, and will allow to show behaviour identical to those that one obtains when using --replicate-* slave options. (It is not clear how much effort is needed to put the parser into mysqlbinlog. Any guesses?) 2.2 Use dumb regexp match ------------------------- Use a really dumb approach. A query is considered to be modifying table X if it matches an expression CREATE TABLE $tablename DROP $tablename UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' DELETE ...$tablename ... WHERE // same as above ALTER TABLE $tablename .. etc (go get from the grammar) .. The advantage over doing the same in awk is that mysqlbinlog will also process RBR statements, and together with that will provide a working solution for those who are careful with their table names not mixing with string constants and such. (TODO: string constants are of particular concern as they come from [potentially hostile] users, unlike e.g. table aliases which come from [not hostile] developers. Remove also all string constants before attempting to do match?) 2.3 Have the master put annotations ----------------------------------- We could add a master option so that it injects into query a mark that tells which tables the query will affect, e.g. for the query UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... the binlog will have /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. 2.4 Implement server functionality to ignore certain tables ----------------------------------------------------------- We could add a general facility in the server to ignore certain tables: SET SESSION ignored_tables = "db1.t1,db2.t2"; This would work similar to --replicate-ignore-table, but in a general way not restricted to the slave SQL thread. It would then be trivial for mysqlbinlog to add such statements at the start of the output, or probably the user could just do it manually with no need for additional options for mysqlbinlog. It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter updates to certain tables CREATION DATE..: Mon, 10 Aug 2009, 13:25 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 40 (http://askmonty.org/worklog/?tid=40) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Fri, 14 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.10896 2009-08-14 15:47:39.000000000 +0300 +++ /tmp/wklog.40.new.10896 2009-08-14 15:47:39.000000000 +0300 @@ -72,3 +72,21 @@ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. + +2.4 Implement server functionality to ignore certain tables +----------------------------------------------------------- + +We could add a general facility in the server to ignore certain tables: + + SET SESSION ignored_tables = "db1.t1,db2.t2"; + +This would work similar to --replicate-ignore-table, but in a general way not +restricted to the slave SQL thread. + +It would then be trivial for mysqlbinlog to add such statements at the start +of the output, or probably the user could just do it manually with no need for +additional options for mysqlbinlog. + +It might be useful to integrate this with the code that already handles +--replicate-ignore-db and similar slave options. + -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.12989 2009-08-10 15:41:23.000000000 +0300 +++ /tmp/wklog.40.new.12989 2009-08-10 15:41:23.000000000 +0300 @@ -1,6 +1,7 @@ - 1. Context ---------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl -=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=- Dependency created: 39 now depends on 40 -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High Level Description modified. --- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300 +++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300 @@ -1,3 +1,4 @@ Replication slave can be set to filter updates to certain tables with ---replicate-[wild-]{do,ignore}-table options. This task is about adding similar -functionality to mysqlbinlog. +--replicate-[wild-]{do,ignore}-table options. + +This task is about adding similar functionality to mysqlbinlog. -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300 +++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300 @@ -1 +1,73 @@ +1. Context +---------- +At the moment, the server has these replication slave options: + + --replicate-do-table=db.tbl + --replicate-ignore-table=db.tbl + --replicate-wild-do-table=pattern.pattern + --replicate-wild-ignore-table=pattern.pattern + +They affect both RBR and SBR events. SBR events are checked after the +statement has been parsed, the server iterates over list of used tables and +checks them againist --replicate instructions. + +What is interesting is that this scheme still allows to update the ignored +table through a VIEW. + +2. Table filtering in mysqlbinlog +--------------------------------- + +Per-table filtering of RBR events is easy (as it is relatively easy to extract +the name of the table that the event applies to). + +Per-table filtering of SBR events is hard, as generally it is not apparent +which tables the statement refers to. + +This opens possible options: + +2.1 Put the parser into mysqlbinlog +----------------------------------- +Once we have a full parser in mysqlbinlog, we'll be able to check which tables +are used by a statement, and will allow to show behaviour identical to those +that one obtains when using --replicate-* slave options. + +(It is not clear how much effort is needed to put the parser into mysqlbinlog. +Any guesses?) + + +2.2 Use dumb regexp match +------------------------- +Use a really dumb approach. A query is considered to be modifying table X if +it matches an expression + +CREATE TABLE $tablename +DROP $tablename +UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' +DELETE ...$tablename ... WHERE // same as above +ALTER TABLE $tablename +.. etc (go get from the grammar) .. + +The advantage over doing the same in awk is that mysqlbinlog will also process +RBR statements, and together with that will provide a working solution for +those who are careful with their table names not mixing with string constants +and such. + +(TODO: string constants are of particular concern as they come from +[potentially hostile] users, unlike e.g. table aliases which come from +[not hostile] developers. Remove also all string constants before attempting +to do match?) + +2.3 Have the master put annotations +----------------------------------- +We could add a master option so that it injects into query a mark that tells +which tables the query will affect, e.g. for the query + + UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... + + +the binlog will have + + /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... + +and further processing in mysqlbinlog will be trivial. DESCRIPTION: Replication slave can be set to filter updates to certain tables with --replicate-[wild-]{do,ignore}-table options. This task is about adding similar functionality to mysqlbinlog. HIGH-LEVEL SPECIFICATION: 1. Context ---------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl --replicate-ignore-table=db.tbl --replicate-wild-do-table=pattern.pattern --replicate-wild-ignore-table=pattern.pattern They affect both RBR and SBR events. SBR events are checked after the statement has been parsed, the server iterates over list of used tables and checks them againist --replicate instructions. What is interesting is that this scheme still allows to update the ignored table through a VIEW. 2. Table filtering in mysqlbinlog --------------------------------- Per-table filtering of RBR events is easy (as it is relatively easy to extract the name of the table that the event applies to). Per-table filtering of SBR events is hard, as generally it is not apparent which tables the statement refers to. This opens possible options: 2.1 Put the parser into mysqlbinlog ----------------------------------- Once we have a full parser in mysqlbinlog, we'll be able to check which tables are used by a statement, and will allow to show behaviour identical to those that one obtains when using --replicate-* slave options. (It is not clear how much effort is needed to put the parser into mysqlbinlog. Any guesses?) 2.2 Use dumb regexp match ------------------------- Use a really dumb approach. A query is considered to be modifying table X if it matches an expression CREATE TABLE $tablename DROP $tablename UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' DELETE ...$tablename ... WHERE // same as above ALTER TABLE $tablename .. etc (go get from the grammar) .. The advantage over doing the same in awk is that mysqlbinlog will also process RBR statements, and together with that will provide a working solution for those who are careful with their table names not mixing with string constants and such. (TODO: string constants are of particular concern as they come from [potentially hostile] users, unlike e.g. table aliases which come from [not hostile] developers. Remove also all string constants before attempting to do match?) 2.3 Have the master put annotations ----------------------------------- We could add a master option so that it injects into query a mark that tells which tables the query will affect, e.g. for the query UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... the binlog will have /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. 2.4 Implement server functionality to ignore certain tables ----------------------------------------------------------- We could add a general facility in the server to ignore certain tables: SET SESSION ignored_tables = "db1.t1,db2.t2"; This would work similar to --replicate-ignore-table, but in a general way not restricted to the slave SQL thread. It would then be trivial for mysqlbinlog to add such statements at the start of the output, or probably the user could just do it manually with no need for additional options for mysqlbinlog. It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter certain kinds of statements CREATION DATE..: Mon, 10 Aug 2009, 15:30 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 41 (http://askmonty.org/worklog/?tid=41) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Fri, 14 Aug 2009, 14:17)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.6963 2009-08-14 14:17:32.000000000 +0300 +++ /tmp/wklog.41.new.6963 2009-08-14 14:17:32.000000000 +0300 @@ -1,6 +1,11 @@ The implementation will depend on design choices made in WL#40: -- If we decide to parse the statement, SQL-verb filtering will be trivial -- If we decide not to parse the statement, we still can reliably distinguish the + +Option 1: + +If we decide to parse the statement, SQL-verb filtering will be trivial + +Option 2: +If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. If we chose the second, we'll have to perform certain normalization before -=-=(Psergey - Mon, 10 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.13282 2009-08-10 15:47:13.000000000 +0300 +++ /tmp/wklog.41.new.13282 2009-08-10 15:47:13.000000000 +0300 @@ -2,3 +2,10 @@ - If we decide to parse the statement, SQL-verb filtering will be trivial - If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. + +If we chose the second, we'll have to perform certain normalization before +matching the patterns: + - Remove all comments from the command + - Remove all pre-space + - Compare the string case-insensitively + - etc -=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300 +++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300 @@ -1 +1,4 @@ - +The implementation will depend on design choices made in WL#40: +- If we decide to parse the statement, SQL-verb filtering will be trivial +- If we decide not to parse the statement, we still can reliably distinguish the +statement by matching the first characters against a set of patterns. -=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=- Dependency created: 39 now depends on 41 DESCRIPTION: Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax subject to discussion): mysqlbinlog --exclude='alter table,drop table,alter database,...' HIGH-LEVEL SPECIFICATION: The implementation will depend on design choices made in WL#40: Option 1: If we decide to parse the statement, SQL-verb filtering will be trivial Option 2: If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. If we chose the second, we'll have to perform certain normalization before matching the patterns: - Remove all comments from the command - Remove all pre-space - Compare the string case-insensitively - etc ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Knielsen): Add a mysqlbinlog option to filter certain kinds of statements (41)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter certain kinds of statements CREATION DATE..: Mon, 10 Aug 2009, 15:30 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: CATEGORY.......: Client-BackLog TASK ID........: 41 (http://askmonty.org/worklog/?tid=41) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Knielsen - Fri, 14 Aug 2009, 14:17)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.6963 2009-08-14 14:17:32.000000000 +0300 +++ /tmp/wklog.41.new.6963 2009-08-14 14:17:32.000000000 +0300 @@ -1,6 +1,11 @@ The implementation will depend on design choices made in WL#40: -- If we decide to parse the statement, SQL-verb filtering will be trivial -- If we decide not to parse the statement, we still can reliably distinguish the + +Option 1: + +If we decide to parse the statement, SQL-verb filtering will be trivial + +Option 2: +If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. If we chose the second, we'll have to perform certain normalization before -=-=(Psergey - Mon, 10 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.13282 2009-08-10 15:47:13.000000000 +0300 +++ /tmp/wklog.41.new.13282 2009-08-10 15:47:13.000000000 +0300 @@ -2,3 +2,10 @@ - If we decide to parse the statement, SQL-verb filtering will be trivial - If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. + +If we chose the second, we'll have to perform certain normalization before +matching the patterns: + - Remove all comments from the command + - Remove all pre-space + - Compare the string case-insensitively + - etc -=-=(Psergey - Mon, 10 Aug 2009, 15:35)=-=- High-Level Specification modified. --- /tmp/wklog.41.old.12689 2009-08-10 15:35:04.000000000 +0300 +++ /tmp/wklog.41.new.12689 2009-08-10 15:35:04.000000000 +0300 @@ -1 +1,4 @@ - +The implementation will depend on design choices made in WL#40: +- If we decide to parse the statement, SQL-verb filtering will be trivial +- If we decide not to parse the statement, we still can reliably distinguish the +statement by matching the first characters against a set of patterns. -=-=(Psergey - Mon, 10 Aug 2009, 15:31)=-=- Dependency created: 39 now depends on 41 DESCRIPTION: Add a mysqlbinlog option to filter certain kinds of statements, i.e. (syntax subject to discussion): mysqlbinlog --exclude='alter table,drop table,alter database,...' HIGH-LEVEL SPECIFICATION: The implementation will depend on design choices made in WL#40: Option 1: If we decide to parse the statement, SQL-verb filtering will be trivial Option 2: If we decide not to parse the statement, we still can reliably distinguish the statement by matching the first characters against a set of patterns. If we chose the second, we'll have to perform certain normalization before matching the patterns: - Remove all comments from the command - Remove all pre-space - Compare the string case-insensitively - etc ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 20 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 09:13)=-=- 2009-8-10: spent 3.5 hrs for analysis of the current implementation of UNION/UNION ALL came up with the idea how to bypass temporary table when executing UNION ALL 2009-8-11: spent 6.5 hrs to prepare a hack that executed UNION ALL without temporary table 2009-8-12: spent 4 hrs more to investigate in debugger different cases with usage of union operations (in subqueries, in queries that do not use tables) 2009-8-13: spent 6 hrs to put together and to publish an HLS document for the task Worked 20 hours and estimate 0 hours remain (original estimate increased by 20 hours). -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Supervisor updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Bothorsen +Monty -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Version updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Benchmarks-3.0 +Server-9.x -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Privacy level updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -y +n -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 20 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 09:13)=-=- 2009-8-10: spent 3.5 hrs for analysis of the current implementation of UNION/UNION ALL came up with the idea how to bypass temporary table when executing UNION ALL 2009-8-11: spent 6.5 hrs to prepare a hack that executed UNION ALL without temporary table 2009-8-12: spent 4 hrs more to investigate in debugger different cases with usage of union operations (in subqueries, in queries that do not use tables) 2009-8-13: spent 6 hrs to put together and to publish an HLS document for the task Worked 20 hours and estimate 0 hours remain (original estimate increased by 20 hours). -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Supervisor updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Bothorsen +Monty -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Version updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Benchmarks-3.0 +Server-9.x -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Privacy level updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -y +n -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Progress (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 20 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 09:13)=-=- 2009-8-10: spent 3.5 hrs for analysis of the current implementation of UNION/UNION ALL came up with the idea how to bypass temporary table when executing UNION ALL 2009-8-11: spent 6.5 hrs to prepare a hack that executed UNION ALL without temporary table 2009-8-12: spent 4 hrs more to investigate in debugger different cases with usage of union operations (in subqueries, in queries that do not use tables) 2009-8-13: spent 6 hrs to put together and to publish an HLS document for the task Worked 20 hours and estimate 0 hours remain (original estimate increased by 20 hours). -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Supervisor updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Bothorsen +Monty -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Version updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Benchmarks-3.0 +Server-9.x -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Privacy level updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -y +n -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Supervisor updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Bothorsen +Monty -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Version updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Benchmarks-3.0 +Server-9.x -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Privacy level updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -y +n -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Supervisor updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Bothorsen +Monty -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Version updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Benchmarks-3.0 +Server-9.x -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Privacy level updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -y +n -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Supervisor updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Bothorsen +Monty -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Version updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -Benchmarks-3.0 +Server-9.x -=-=(Guest - Fri, 14 Aug 2009, 08:52)=-=- Privacy level updated. --- /tmp/wklog.44.old.22769 2009-08-14 08:52:13.000000000 +0300 +++ /tmp/wklog.44.new.22769 2009-08-14 08:52:13.000000000 +0300 @@ -1 +1 @@ -y +n -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:50)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22656 2009-08-14 08:50:48.000000000 +0300 +++ /tmp/wklog.44.new.22656 2009-08-14 08:50:48.000000000 +0300 @@ -19,28 +19,29 @@ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: - (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union all + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: - (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where -a2!=b2) union all + (select a1,b1,c3 from t1 where a1=b1) union + (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) - (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where -a2!=b2) union + (select a1,b1,c1 from t1 where a1=b1) union all + (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) + It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. -MySQL does not accept nested unions. For example the following valid query is -considered by MySQL Server as erroneous: - ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) -) union all - ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) +MySQL does not accept nested unions. For example the following valid SQL query +is considered by MySQL Server as erroneous: + ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) + union all + ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid SQL query is considered by MySQL Server as erroneous: ((select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2)) union all ((select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4)) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid query is considered by MySQL Server as erroneous: ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) ) union all ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid query is considered by MySQL Server as erroneous: ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) ) union all ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid query is considered by MySQL Server as erroneous: ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) ) union all ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:45)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22406 2009-08-14 08:45:22.000000000 +0300 +++ /tmp/wklog.44.new.22406 2009-08-14 08:45:22.000000000 +0300 @@ -6,15 +6,15 @@ 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying - 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server -================================== +============================================ 1.1. Specifics of MySQL union operations ------------------------------------------------------- +---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example @@ -49,7 +49,7 @@ In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------------- +----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. @@ -77,7 +77,7 @@ select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ----------------------------------- +---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned @@ -109,13 +109,13 @@ rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations -================================================= +=============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------- +-------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an @@ -159,7 +159,7 @@ }; 2.2. Avoiding unnecessary copying ------------------------------------------- +--------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects @@ -174,8 +174,8 @@ needed that would take as parameter the info that says what fields are to be stored in the record buffer. -2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ----------------------------------------------------------------------------------------------------------- +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL +---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last @@ -190,7 +190,7 @@ 3. Other possible optimizations for union units -================================= +=============================================== The following optimizations are not supposed to be implemented in the framework this task. -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ============================================ 1.1. Specifics of MySQL union operations ---------------------------------------- UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid query is considered by MySQL Server as erroneous: ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) ) union all ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ----------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations =============================================================== The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table -------------------------------------------------- If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying --------------------------------- If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL ---------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units =============================================== The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ================================== 1.1. Specifics of MySQL union operations ------------------------------------------------------ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid query is considered by MySQL Server as erroneous: ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) ) union all ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ---------------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations ================================================= The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------ If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying ------------------------------------------ If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ---------------------------------------------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units ================================= The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ================================== 1.1. Specifics of MySQL union operations ------------------------------------------------------ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid query is considered by MySQL Server as erroneous: ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) ) union all ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ---------------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations ================================================= The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------ If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying ------------------------------------------ If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ---------------------------------------------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units ================================= The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ================================== 1.1. Specifics of MySQL union operations ------------------------------------------------------ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid query is considered by MySQL Server as erroneous: ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) ) union all ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ---------------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations ================================================= The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------ If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying ------------------------------------------ If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ---------------------------------------------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units ================================= The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: -=-=(Guest - Fri, 14 Aug 2009, 08:41)=-=- High-Level Specification modified. --- /tmp/wklog.44.old.22182 2009-08-14 08:41:17.000000000 +0300 +++ /tmp/wklog.44.new.22182 2009-08-14 08:41:17.000000000 +0300 @@ -1 +1,205 @@ +<contents> +1. Handling union operations in MySQL Server + 1.1. Specifics of MySQL union operations + 1.2 Validation of union units + 1.3 Execution of union units +2. Optimizations improving performance of UNION ALL operations + 2.1 Execution of UNION ALL without temporary table + 2.2. Avoiding unnecessary copying + 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +3. Other possible optimizations for union units +</contents> + +1. Handling union operations in MySQL Server +================================== + +1.1. Specifics of MySQL union operations +------------------------------------------------------ + +UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL +allows us to use these operations in a sequence, one after another. For example +the following queries are accepted by the MySQL Server: + (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (1) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (2) +Any mix of UNION and UNION ALL is also acceptable: + (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where +a2!=b2) union all + (select a3,b3,c3 from t3 where a3>b3); (3) + (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where +a2!=b2) union + (select a3,b3,c3 from t3 where a3>b3); (4) +It should be noted that query (4) is equivalent to query (1). At the same time +query (3) is not equivalent to any of the queries (1),(2),(4). +In general any UNION ALL in a sequence of union operations can be equivalently +substituted for UNION if there occur another UNION further in the sequence. +MySQL does not accept nested unions. For example the following valid query is +considered by MySQL Server as erroneous: + ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) +) union all + ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) + +A sequence of select constructs separated by UNION/UNION ALL is called 'union +unit' if it s not a part of another such sequence. +A union unit can be executed as a query. It also can be used as a subquery. +A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. +In this case it cannot be used as a subquery. + +1.2 Validation of union units +---------------------------------- + +When the parser stage is over the further processing of a union unit is +performed by the function mysql_union. +The function first validate the unit in the method SELECT_LEX_UNIT::prepare. +The method first validates each of the select constructs of the unit and then it +checks that all select are compatible. The method checks that the selects return +the same number of columns and for each set of columns with the same number k +there is a type to which the types of the columns can be coerced. This type is +considered as the type of column k of the result set returned by the union unit. +For example, if in the query (1) the columns b1, b2 and b3 are of the types int, +bigint and double respectively then the second column of the union unit will be +of the type double. If the types of the columns c1,c2,c3 are specified as +varchar(10), varchar(20), varchar(10) then the type of the corresponding column +of the result set will be varchar(20). If the columns have different collations +then a collation from which all these collations can be derived is looked for +and it is assigned as the +collation of the third column in the result set. +After compatibility of the corresponding select columns has been checked and the +types of the columns from of the result set have been determined the method +SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the +result set for the union unit. Currently rows returned by the selects from the +union unit are always written into a temporary table. To force selects to send +rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for +the selects such that the JOIN::result field refers to an object of the class +select_union. All selects from a union unit share the same select_union object. + +1.3 Execution of union units +---------------------------------- + +After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has +created a temporary table as a container for rows from the result sets returned +by the selects of the unit, and has prepared all data structures needed for +execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. +The method SELECT_LEX_UNIT::exec processes the selects from the union unit one +by one. +Each select first is optimized with JOIN::optimize(), then it's executed with +JOIN::exec().The result rows from each select are sent to a temporary table. +This table accumulates all rows that are to be returned by the union unit. For +UNION operations duplicate rows are not added, for UNION ALL operations all +records are added. It is achieved by enabling and disabling usage of the unique +index defined on all fields of the temporary table. The index is never used if +only UINION ALL operation occurs in the unit. Otherwise it is enabled before +the first select is executed and disabled after the last UNION operation. +To send rows to the temporary table the method select_union::send_data is used. +For a row it receives from the currently executed select the method first stores +the fields of the row in in the fields of the record buffer of the temporary +table. To do this the method calls function fill_record. All needed type +conversions of the field values are performed when they are stored the record +buffer. After this the method select_union::send_data calls the ha_write_row +handler function to write the record from the buffer to the temporary table. A +possible error on duplicate key that occurs with an attempt to write a duplicate +row is ignored. +After all rows received from all selects have been placed into the temporary +table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows +from the temporary table and sends them to the output stream (to the client). If +there is an ORDER BY clause to be applied to result of the union unit then the +rows read from the temporary table have to be sorted first. + +2. Optimizations improving performance of UNION ALL operations +================================================= + +The following three optimizations are proposed to be implemented in the +framework of this task. + +2.1 Execution of UNION ALL without temporary table +------------------------------------------------------------------ + +If a union unit with only UNION ALL operations is used at the top level of the +query (in other words it's not used as a subquery) and is not appended with an +ORDER BY clause then it does not make sense to send rows received from selects +to a temporary table at all. After all needed type conversions have been done +the row fields could be sent directly into the output stream. It would improve +the performance of UNION ALL operations since writing to the temporary table and +reading from it would not be needed anymore. In the cases when the result set is +big enough and the temporary table cannot be allocated in the main memory the +performance gains would be significant. Besides, the client could get the first +result rows at once as it would not have to wait until all selects have been +executed. +To make an UNION ALL operation not to send rows to a temporary table we could +provide the JOIN objects created for the selects from the union unit with an +interceptor object that differs from the one they use now. In the current code +they use an object of the class select_union derived from the +select_result_interceptor class. The new interceptor object of the class that +we'll call select_union_send (by analogy with the class select_send) shall +inherit from the select_union and shall have its own implementations of the +virtual methods send_data, send_fields, and send_eof. +The method send_data shall send fields received from selects to the record +buffer of the temporary table and then from this buffer to the output stream. +The method send_fields shall send the format of the rows to the client before it +starts getting records from the first select , while the method send_eof shall +signal about the end of the rows after the last select finishes sending records. +The method create_result_table of the class select_union shall be re-defined +as virtual. The implementation of this method for the class select_union_send +shall call select_union::create_result_table and then shall build internal +structures needed for select_unionsend::send_data. So, the definition of the +class select_union_send should look like this: + class select_union_send :public select_union + { + ... // private structures + public: + select_union_send() :select_union(), ...{...} + bool send_data(List<Item> &items); + bool send_fields(List<Item> &list, uint flags); + bool create_result_table(THD *thd, List<Item> *column_types, + bool is_distinct, ulonglong options, + const char *alias); + }; + +2.2. Avoiding unnecessary copying +------------------------------------------ + +If a field does not need type conversion it does not make sense to send it to a +record buffer. It can be sent directly to the output stream. Different selects +can require type conversions for different columns. +Let's provide each select from the union unit with a data structure (e.g. a +bitmap) that says what fields require conversions, and what don't . Before +execution of a select this data structure must be passed to the +select_union_send object shared by all selects from the unit. The info in this +structure will tell select_union_send::send_data what fields should be sent to +the record buffer for type conversion and what can be sent directly to the +output stream. In this case another variant of the fill_record procedure is +needed that would take as parameter the info that says what fields are to be +stored in the record buffer. + +2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations +---------------------------------------------------------------------------------------------------------- + +If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is +used at the top level of a query then any UNION ALL operation after the last +UNION operation can be executed in more efficient way than it's done in the +current implementation. More exactly, the rows from any select that follows +after the second operand of the last UNION operations could be sent directly to +the output stream. In this case two interceptor objects have to be created: one, +of the type select_union, is shared by the selects for which UNION operations +are performed, another, of the type select_union_send, is shared by the the +remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to +undergo a serious re-work. + + +3. Other possible optimizations for union units +================================= + +The following optimizations are not supposed to be implemented in the framework +this task. +1. For a union unit containing only UNION ALL with an ORDER BY send rows from +selects directly to the sorting procedure. +2. For a union unit at the top level of the query without ORDER BY clause send +any row received from an operand of a UNION operation directly to the output +stream as soon as it has been checked by a lookup in the temporary table that +it's not a duplicate. +3. Not to use temporary table for any union unit used in EXIST or IN subquery. + DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. HIGH-LEVEL SPECIFICATION: <contents> 1. Handling union operations in MySQL Server 1.1. Specifics of MySQL union operations 1.2 Validation of union units 1.3 Execution of union units 2. Optimizations improving performance of UNION ALL operations 2.1 Execution of UNION ALL without temporary table 2.2. Avoiding unnecessary copying 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations 3. Other possible optimizations for union units </contents> 1. Handling union operations in MySQL Server ================================== 1.1. Specifics of MySQL union operations ------------------------------------------------------ UNION and UNION ALL are the only set operations supported by MySQL Server. MySQL allows us to use these operations in a sequence, one after another. For example the following queries are accepted by the MySQL Server: (select a1,b1,c1 from t1 where a1=b1) union (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (1) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (2) Any mix of UNION and UNION ALL is also acceptable: (select a1,b1,c3 from t1 where a1=b1) union (select a2,b2,c3 from t2 where a2!=b2) union all (select a3,b3,c3 from t3 where a3>b3); (3) (select a1,b1,c1 from t1 where a1=b1) union all (select a2,b2,c2 from t2 where a2!=b2) union (select a3,b3,c3 from t3 where a3>b3); (4) It should be noted that query (4) is equivalent to query (1). At the same time query (3) is not equivalent to any of the queries (1),(2),(4). In general any UNION ALL in a sequence of union operations can be equivalently substituted for UNION if there occur another UNION further in the sequence. MySQL does not accept nested unions. For example the following valid query is considered by MySQL Server as erroneous: ( (select a1,b1 from t1 where a1=b1) union (select a2,b2 from t2 where a2!=b2) ) union all ( (select a3,b3 from t3 where a3=b3) union (select a4,b4 from t4 where a4!=b4) ) A sequence of select constructs separated by UNION/UNION ALL is called 'union unit' if it s not a part of another such sequence. A union unit can be executed as a query. It also can be used as a subquery. A union unit can be optionally appended by an ORDER BY and/or LIMIT construct. In this case it cannot be used as a subquery. 1.2 Validation of union units ---------------------------------- When the parser stage is over the further processing of a union unit is performed by the function mysql_union. The function first validate the unit in the method SELECT_LEX_UNIT::prepare. The method first validates each of the select constructs of the unit and then it checks that all select are compatible. The method checks that the selects return the same number of columns and for each set of columns with the same number k there is a type to which the types of the columns can be coerced. This type is considered as the type of column k of the result set returned by the union unit. For example, if in the query (1) the columns b1, b2 and b3 are of the types int, bigint and double respectively then the second column of the union unit will be of the type double. If the types of the columns c1,c2,c3 are specified as varchar(10), varchar(20), varchar(10) then the type of the corresponding column of the result set will be varchar(20). If the columns have different collations then a collation from which all these collations can be derived is looked for and it is assigned as the collation of the third column in the result set. After compatibility of the corresponding select columns has been checked and the types of the columns from of the result set have been determined the method SELECT_LEX_UNIT::prepare creates a temporary table to store the rows of the result set for the union unit. Currently rows returned by the selects from the union unit are always written into a temporary table. To force selects to send rows to this temporary table SELECT_LEX_UNIT::prepare creates JOIN objects for the selects such that the JOIN::result field refers to an object of the class select_union. All selects from a union unit share the same select_union object. 1.3 Execution of union units ---------------------------------- After SELECT_LEX_UNIT::prepare has successfully validated the union unit, has created a temporary table as a container for rows from the result sets returned by the selects of the unit, and has prepared all data structures needed for execution, the function mysql_union invokes SELECT_LEX_UNIT::exec. The method SELECT_LEX_UNIT::exec processes the selects from the union unit one by one. Each select first is optimized with JOIN::optimize(), then it's executed with JOIN::exec().The result rows from each select are sent to a temporary table. This table accumulates all rows that are to be returned by the union unit. For UNION operations duplicate rows are not added, for UNION ALL operations all records are added. It is achieved by enabling and disabling usage of the unique index defined on all fields of the temporary table. The index is never used if only UINION ALL operation occurs in the unit. Otherwise it is enabled before the first select is executed and disabled after the last UNION operation. To send rows to the temporary table the method select_union::send_data is used. For a row it receives from the currently executed select the method first stores the fields of the row in in the fields of the record buffer of the temporary table. To do this the method calls function fill_record. All needed type conversions of the field values are performed when they are stored the record buffer. After this the method select_union::send_data calls the ha_write_row handler function to write the record from the buffer to the temporary table. A possible error on duplicate key that occurs with an attempt to write a duplicate row is ignored. After all rows received from all selects have been placed into the temporary table the method SELECT_LEX_UNIT::exec calls mysql_select that reads rows from the temporary table and sends them to the output stream (to the client). If there is an ORDER BY clause to be applied to result of the union unit then the rows read from the temporary table have to be sorted first. 2. Optimizations improving performance of UNION ALL operations ================================================= The following three optimizations are proposed to be implemented in the framework of this task. 2.1 Execution of UNION ALL without temporary table ------------------------------------------------------------------ If a union unit with only UNION ALL operations is used at the top level of the query (in other words it's not used as a subquery) and is not appended with an ORDER BY clause then it does not make sense to send rows received from selects to a temporary table at all. After all needed type conversions have been done the row fields could be sent directly into the output stream. It would improve the performance of UNION ALL operations since writing to the temporary table and reading from it would not be needed anymore. In the cases when the result set is big enough and the temporary table cannot be allocated in the main memory the performance gains would be significant. Besides, the client could get the first result rows at once as it would not have to wait until all selects have been executed. To make an UNION ALL operation not to send rows to a temporary table we could provide the JOIN objects created for the selects from the union unit with an interceptor object that differs from the one they use now. In the current code they use an object of the class select_union derived from the select_result_interceptor class. The new interceptor object of the class that we'll call select_union_send (by analogy with the class select_send) shall inherit from the select_union and shall have its own implementations of the virtual methods send_data, send_fields, and send_eof. The method send_data shall send fields received from selects to the record buffer of the temporary table and then from this buffer to the output stream. The method send_fields shall send the format of the rows to the client before it starts getting records from the first select , while the method send_eof shall signal about the end of the rows after the last select finishes sending records. The method create_result_table of the class select_union shall be re-defined as virtual. The implementation of this method for the class select_union_send shall call select_union::create_result_table and then shall build internal structures needed for select_unionsend::send_data. So, the definition of the class select_union_send should look like this: class select_union_send :public select_union { ... // private structures public: select_union_send() :select_union(), ...{...} bool send_data(List<Item> &items); bool send_fields(List<Item> &list, uint flags); bool create_result_table(THD *thd, List<Item> *column_types, bool is_distinct, ulonglong options, const char *alias); }; 2.2. Avoiding unnecessary copying ------------------------------------------ If a field does not need type conversion it does not make sense to send it to a record buffer. It can be sent directly to the output stream. Different selects can require type conversions for different columns. Let's provide each select from the union unit with a data structure (e.g. a bitmap) that says what fields require conversions, and what don't . Before execution of a select this data structure must be passed to the select_union_send object shared by all selects from the unit. The info in this structure will tell select_union_send::send_data what fields should be sent to the record buffer for type conversion and what can be sent directly to the output stream. In this case another variant of the fill_record procedure is needed that would take as parameter the info that says what fields are to be stored in the record buffer. 2.3 Optimizing execution of a union unit with a mix of UNION/UNION ALL operations ---------------------------------------------------------------------------------------------------------- If a union unit with a mix of UNIIN/UNION ALL operations and without ORDER BY is used at the top level of a query then any UNION ALL operation after the last UNION operation can be executed in more efficient way than it's done in the current implementation. More exactly, the rows from any select that follows after the second operand of the last UNION operations could be sent directly to the output stream. In this case two interceptor objects have to be created: one, of the type select_union, is shared by the selects for which UNION operations are performed, another, of the type select_union_send, is shared by the the remaining selects. For this optimization the method SELECT_LEX_UNIT::exec is to undergo a serious re-work. 3. Other possible optimizations for union units ================================= The following optimizations are not supposed to be implemented in the framework this task. 1. For a union unit containing only UNION ALL with an ORDER BY send rows from selects directly to the sorting procedure. 2. For a union unit at the top level of the query without ORDER BY clause send any row received from an operand of a UNION operation directly to the output stream as soon as it has been checked by a lookup in the temporary table that it's not a duplicate. 3. Not to use temporary table for any union unit used in EXIST or IN subquery. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Igor): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Igor): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Igor): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] New (by Igor): Implement UNION ALL without usage of a temporary table (44)
by worklog-noreply＠askmonty.org 14 Aug '09

14 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Implement UNION ALL without usage of a temporary table CREATION DATE..: Fri, 14 Aug 2009, 08:31 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: COPIES TO......: Monty, Psergey CATEGORY.......: Client-BackLog TASK ID........: 44 (http://askmonty.org/worklog/?tid=44) VERSION........: Benchmarks-3.0 STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 0 (hours remain) ORIG. ESTIMATE.: 0 PROGRESS NOTES: DESCRIPTION: Currently when any union operation is executed the rows received from its operands are always sent to a temporary table. Meanwhile for a UNION ALL operation that is used at the top level of a query without an ORDER BY clause it is not necessary. In this case the rows could be sent directly to the client. The goal of this task is to provide such an implementation of UNION ALL operation that would not use temporary table at all in certain, most usable cases. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Rev 2720: Merge maria-5.1 -> maria-5.1-table-elimination in file:///home/psergey/dev/maria-5.1-table-elim-r10/
by Sergey Petrunya 13 Aug '09

13 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r10/ ------------------------------------------------------------ revno: 2720 revision-id: psergey(a)askmonty.org-20090813211212-jghejwxsl6adtopl parent: knielsen(a)knielsen-hq.org-20090805072137-wg97dcem1cxnzt3p parent: psergey(a)askmonty.org-20090813204452-o8whzlbio19cgkyv committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r10 timestamp: Fri 2009-08-14 01:12:12 +0400 message: Merge maria-5.1 -> maria-5.1-table-elimination added: mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1 mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1 sql-bench/test-table-elimination.sh testtableelimination-20090616194329-gai92muve732qknl-1 sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 modified: .bzrignore sp1f-ignore-20001018235455-q4gxfbritt5f42nwix354ufpsvrf5ebj libmysqld/Makefile.am sp1f-makefile.am-20010411110351-26htpk3ynkyh7pkfvnshztqrxx3few4g mysql-test/r/mysql-bug41486.result mysqlbug41486.result-20090323135900-fobg67a3yzg0b7e8-1 mysql-test/r/ps_11bugs.result sp1f-ps_11bugs.result-20041012140047-4pktjlfeq27q6bxqfdsbcszr5nybv6zz mysql-test/r/select.result sp1f-select.result-20010103001548-znkoalxem6wchsbxizfosjhpfmhfyxuk mysql-test/r/subselect.result sp1f-subselect.result-20020512204640-zgegcsgavnfd7t7eyrf7ibuqomsw7uzo mysql-test/r/union.result sp1f-unions_one.result-20010725122836-ofxtwraxeohz7whhrmfdz57sl4a5prmp mysql-test/t/mysql-bug41486.test mysqlbug41486.test-20090323135900-fobg67a3yzg0b7e8-2 mysql-test/valgrind.supp sp1f-valgrind.supp-20050406142216-yg7xhezklqhgqlc3inx36vbghodhbovy sql/CMakeLists.txt sp1f-cmakelists.txt-20060831175237-esoeu5kpdtwjvehkghwy6fzbleniq2wy sql/Makefile.am sp1f-makefile.am-19700101030959-xsjdiakci3nqcdd4xl4yomwdl5eo2f3q sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa sql/item_subselect.cc sp1f-item_subselect.cc-20020512204640-qep43aqhsfrwkqmrobni6czc3fqj36oo sql/item_subselect.h sp1f-item_subselect.h-20020512204640-qdg77wil56cxyhtc2bjjdrppxq3wqgh3 sql/item_sum.cc sp1f-item_sum.cc-19700101030959-4woo23bi3am2t2zvsddqbpxk7xbttdkm sql/item_sum.h sp1f-item_sum.h-19700101030959-ecgohlekwm355wxl5fv4zzq3alalbwyl sql/sql_bitmap.h sp1f-sql_bitmap.h-20031024204444-g4eiad7vopzqxe2trxmt3fn3xsvnomvj sql/sql_lex.cc sp1f-sql_lex.cc-19700101030959-4pizwlu5rqkti27gcwsvxkawq6bc2kph sql/sql_lex.h sp1f-sql_lex.h-19700101030959-sgldb2sooc7twtw5q7pgjx7qzqiaa3sn sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija ------------------------------------------------------------ revno: 2707.1.27 revision-id: psergey(a)askmonty.org-20090813204452-o8whzlbio19cgkyv parent: psergey(a)askmonty.org-20090813191053-g1xfeieoti4bqgbc committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Fri 2009-08-14 00:44:52 +0400 message: MWL#17: Table elimination - More function renames, added comments modified: sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 ------------------------------------------------------------ revno: 2707.1.26 revision-id: psergey(a)askmonty.org-20090813191053-g1xfeieoti4bqgbc parent: psergey(a)askmonty.org-20090813093613-hy7tdlsgdy83xszq committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 23:10:53 +0400 message: MWL#17: Table elimination - Better comments modified: sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2707.1.25 revision-id: psergey(a)askmonty.org-20090813093613-hy7tdlsgdy83xszq parent: psergey(a)askmonty.org-20090813092402-jlqucf6nultxlv4b committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 13:36:13 +0400 message: MWL#17: Table elimination Fixes after post-review fixes: - Don't search for tables in JOIN_TAB array. it's not initialized yet. use select_lex->leaf_tables instead. modified: sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 ------------------------------------------------------------ revno: 2707.1.24 revision-id: psergey(a)askmonty.org-20090813092402-jlqucf6nultxlv4b parent: psergey(a)askmonty.org-20090813000143-dukzk352hjywidk7 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 13:24:02 +0400 message: MWL#17: Table elimination - Post-postreview changes fix: Do set NESTED_JOIN::n_tables to number of tables left after elimination. modified: sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2707.1.23 revision-id: psergey(a)askmonty.org-20090813000143-dukzk352hjywidk7 parent: psergey(a)askmonty.org-20090812234302-10es7qmf0m09ahbq committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 04:01:43 +0400 message: MWL#17: Table elimination - When making inferences "field is bound" -> "key is bound", do check that the field is part of the key modified: sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 ------------------------------------------------------------ revno: 2707.1.22 revision-id: psergey(a)askmonty.org-20090812234302-10es7qmf0m09ahbq parent: psergey(a)askmonty.org-20090812223421-w4xyzj7azqgo83ps committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 03:43:02 +0400 message: MWL#17: Table elimination - Continue addressing review feedback: remove "unusable KEYUSEs" extension as it is no longer needed. modified: sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz ------------------------------------------------------------ revno: 2707.1.21 revision-id: psergey(a)askmonty.org-20090812223421-w4xyzj7azqgo83ps parent: psergey(a)askmonty.org-20090708171038-9nyc3hcg1o7h8635 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 02:34:21 +0400 message: MWL#17: Table elimination Address review feedback: - Change from Wave-based approach (a-la const table detection) to building and walking functional dependency graph. - Change from piggy-backing on ref-access code and KEYUSE structures to using our own expression analyzer. modified: sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 sql/sql_bitmap.h sp1f-sql_bitmap.h-20031024204444-g4eiad7vopzqxe2trxmt3fn3xsvnomvj ------------------------------------------------------------ revno: 2707.1.20 revision-id: psergey(a)askmonty.org-20090708171038-9nyc3hcg1o7h8635 parent: psergey(a)askmonty.org-20090630132018-8qwou8bqiq5z1qjg committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Wed 2009-07-08 21:10:38 +0400 message: MWL#17: Table elimination - When collecting Item_subselect::refers_to, put references to the correct subselect entry. modified: sql/sql_lex.cc sp1f-sql_lex.cc-19700101030959-4pizwlu5rqkti27gcwsvxkawq6bc2kph ------------------------------------------------------------ revno: 2707.1.19 revision-id: psergey(a)askmonty.org-20090630132018-8qwou8bqiq5z1qjg parent: psergey(a)askmonty.org-20090630131100-r6o8yqzse4yvny9l committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Tue 2009-06-30 17:20:18 +0400 message: MWL#17: Table elimination - More comments - Renove old code modified: sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 ------------------------------------------------------------ revno: 2707.1.18 revision-id: psergey(a)askmonty.org-20090630131100-r6o8yqzse4yvny9l parent: psergey(a)askmonty.org-20090629135115-472up9wsj0dq843i committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Tue 2009-06-30 17:11:00 +0400 message: MWL#17: Table elimination - Last fixes modified: sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija ------------------------------------------------------------ revno: 2707.1.17 revision-id: psergey(a)askmonty.org-20090629135115-472up9wsj0dq843i parent: psergey(a)askmonty.org-20090625200729-u11xpwwn5ebddx09 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Mon 2009-06-29 17:51:15 +0400 message: MWL#17: Table elimination modified: mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1 mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1 sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija ------------------------------------------------------------ revno: 2707.1.16 revision-id: psergey(a)askmonty.org-20090625200729-u11xpwwn5ebddx09 parent: psergey(a)askmonty.org-20090625100947-mg9xwnbeyyjgzl3w committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-movearound timestamp: Fri 2009-06-26 00:07:29 +0400 message: MWL#17: Table elimination - Better comments, variable/function renames modified: sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz ------------------------------------------------------------ revno: 2707.1.15 revision-id: psergey(a)askmonty.org-20090625100947-mg9xwnbeyyjgzl3w parent: psergey(a)askmonty.org-20090624224414-71xqbljy8jf4z1qs parent: psergey(a)askmonty.org-20090625100553-j1xenbz3o5nekiu2 committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Thu 2009-06-25 14:09:47 +0400 message: Automerge added: sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 modified: .bzrignore sp1f-ignore-20001018235455-q4gxfbritt5f42nwix354ufpsvrf5ebj libmysqld/Makefile.am sp1f-makefile.am-20010411110351-26htpk3ynkyh7pkfvnshztqrxx3few4g sql/CMakeLists.txt sp1f-cmakelists.txt-20060831175237-esoeu5kpdtwjvehkghwy6fzbleniq2wy sql/Makefile.am sp1f-makefile.am-19700101030959-xsjdiakci3nqcdd4xl4yomwdl5eo2f3q sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa sql/item_subselect.cc sp1f-item_subselect.cc-20020512204640-qep43aqhsfrwkqmrobni6czc3fqj36oo sql/item_sum.h sp1f-item_sum.h-19700101030959-ecgohlekwm355wxl5fv4zzq3alalbwyl sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz ------------------------------------------------------------ revno: 2707.3.1 revision-id: psergey(a)askmonty.org-20090625100553-j1xenbz3o5nekiu2 parent: psergey(a)askmonty.org-20090624090104-c63mp3sfxcxytk0d committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-movearound timestamp: Thu 2009-06-25 14:05:53 +0400 message: MWL#17: Table elimination - Moved table elimination code to sql/opt_table_elimination.cc - Added comments added: sql/opt_table_elimination.cc opt_table_eliminatio-20090625095316-7ka9w3zr7n5114iv-1 modified: .bzrignore sp1f-ignore-20001018235455-q4gxfbritt5f42nwix354ufpsvrf5ebj libmysqld/Makefile.am sp1f-makefile.am-20010411110351-26htpk3ynkyh7pkfvnshztqrxx3few4g sql/CMakeLists.txt sp1f-cmakelists.txt-20060831175237-esoeu5kpdtwjvehkghwy6fzbleniq2wy sql/Makefile.am sp1f-makefile.am-19700101030959-xsjdiakci3nqcdd4xl4yomwdl5eo2f3q sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa sql/item_subselect.cc sp1f-item_subselect.cc-20020512204640-qep43aqhsfrwkqmrobni6czc3fqj36oo sql/item_sum.h sp1f-item_sum.h-19700101030959-ecgohlekwm355wxl5fv4zzq3alalbwyl sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz ------------------------------------------------------------ revno: 2707.1.14 revision-id: psergey(a)askmonty.org-20090624224414-71xqbljy8jf4z1qs parent: psergey(a)askmonty.org-20090624090104-c63mp3sfxcxytk0d committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Thu 2009-06-25 02:44:14 +0400 message: MWL#17: Table elimination - fix a typo bug in has_eqref_access_candidate() - Adjust test to remove race condition modified: mysql-test/r/mysql-bug41486.result mysqlbug41486.result-20090323135900-fobg67a3yzg0b7e8-1 mysql-test/t/mysql-bug41486.test mysqlbug41486.test-20090323135900-fobg67a3yzg0b7e8-2 sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn ------------------------------------------------------------ revno: 2707.1.13 revision-id: psergey(a)askmonty.org-20090624090104-c63mp3sfxcxytk0d parent: psergey(a)askmonty.org-20090623200613-w9dl8g41ysf51r80 committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Wed 2009-06-24 13:01:04 +0400 message: More comments modified: sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2707.1.12 revision-id: psergey(a)askmonty.org-20090623200613-w9dl8g41ysf51r80 parent: psergey(a)askmonty.org-20090622114631-yop0q2p8ktmfnctm committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Wed 2009-06-24 00:06:13 +0400 message: MWL#17: Table elimination - More testcases - Let add_ft_key() set keyuse->usable modified: mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1 mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1 sql-bench/test-table-elimination.sh testtableelimination-20090616194329-gai92muve732qknl-1 sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2707.1.11 revision-id: psergey(a)askmonty.org-20090622114631-yop0q2p8ktmfnctm parent: psergey(a)askmonty.org-20090617052739-37i1r8lip0m4ft9r committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Mon 2009-06-22 15:46:31 +0400 message: MWL#17: Table elimination - Make elimination check to be able detect cases like t.primary_key_col1=othertbl.col AND t.primary_key_col2=func(t.primary_key_col1). These are needed to handle e.g. the case of func() being a correlated subquery that selects the latest value. - If we've removed a condition with subquery predicate, EXPLAIN [EXTENDED] won't show the subquery anymore modified: sql/item.cc sp1f-item.cc-19700101030959-u7hxqopwpfly4kf5ctlyk2dvrq4l3dhn sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa sql/item_subselect.cc sp1f-item_subselect.cc-20020512204640-qep43aqhsfrwkqmrobni6czc3fqj36oo sql/item_subselect.h sp1f-item_subselect.h-20020512204640-qdg77wil56cxyhtc2bjjdrppxq3wqgh3 sql/item_sum.cc sp1f-item_sum.cc-19700101030959-4woo23bi3am2t2zvsddqbpxk7xbttdkm sql/sql_lex.cc sp1f-sql_lex.cc-19700101030959-4pizwlu5rqkti27gcwsvxkawq6bc2kph sql/sql_lex.h sp1f-sql_lex.h-19700101030959-sgldb2sooc7twtw5q7pgjx7qzqiaa3sn sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz ------------------------------------------------------------ revno: 2707.1.10 revision-id: psergey(a)askmonty.org-20090617052739-37i1r8lip0m4ft9r parent: psergey(a)askmonty.org-20090616204358-yjkyfxczsomrn9yn committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Wed 2009-06-17 09:27:39 +0400 message: * Use excessive parentheses to stop compiler warning * Fix test results to account for changes in previous cset modified: mysql-test/r/select.result sp1f-select.result-20010103001548-znkoalxem6wchsbxizfosjhpfmhfyxuk sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2707.1.9 revision-id: psergey(a)askmonty.org-20090616204358-yjkyfxczsomrn9yn parent: psergey(a)askmonty.org-20090616195413-rfmi9un20za8gn8g parent: psergey(a)askmonty.org-20090615162208-p4w8s8jo06bdz1vj committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Wed 2009-06-17 00:43:58 +0400 message: * Merge * Change valgrind suppression to work on valgrind 3.3.0 modified: mysql-test/valgrind.supp sp1f-valgrind.supp-20050406142216-yg7xhezklqhgqlc3inx36vbghodhbovy ------------------------------------------------------------ revno: 2707.2.1 revision-id: psergey(a)askmonty.org-20090615162208-p4w8s8jo06bdz1vj parent: psergey(a)askmonty.org-20090614205924-1vnfwbuo4brzyfhp committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-movearound timestamp: Mon 2009-06-15 20:22:08 +0400 message: Fix spurious valgrind warnings in rpl_trigger.test modified: mysql-test/valgrind.supp sp1f-valgrind.supp-20050406142216-yg7xhezklqhgqlc3inx36vbghodhbovy ------------------------------------------------------------ revno: 2707.1.8 revision-id: psergey(a)askmonty.org-20090616195413-rfmi9un20za8gn8g parent: psergey(a)askmonty.org-20090614205924-1vnfwbuo4brzyfhp committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Tue 2009-06-16 23:54:13 +0400 message: MWL#17: Table elimination - Move eliminate_tables() to before constant table detection. - First code for benchmark added: sql-bench/test-table-elimination.sh testtableelimination-20090616194329-gai92muve732qknl-1 modified: sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2707.1.7 revision-id: psergey(a)askmonty.org-20090614205924-1vnfwbuo4brzyfhp parent: psergey(a)askmonty.org-20090614123504-jf4pcb333ojwaxfy committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Mon 2009-06-15 00:59:24 +0400 message: MWL#17: Table elimination - Fix print_join() to work both for EXPLAIN EXTENDED (after table elimination) and for CREATE VIEW (after join->prepare() but without any optimization). modified: mysql-test/r/union.result sp1f-unions_one.result-20010725122836-ofxtwraxeohz7whhrmfdz57sl4a5prmp sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2707.1.6 revision-id: psergey(a)askmonty.org-20090614123504-jf4pcb333ojwaxfy parent: psergey(a)askmonty.org-20090614100110-u7l54gk0b6zbtj50 committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Sun 2009-06-14 16:35:04 +0400 message: MWL#17: Table elimination - Fix the previous cset: take into account that select_lex may be printed when 1. There is no select_lex->join at all (in that case, assume that no tables were eliminated) 2. select_lex->join exists but there was no JOIN::optimize() call yet. handle this by initializing join->eliminated really early. modified: sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz ------------------------------------------------------------ revno: 2707.1.5 revision-id: psergey(a)askmonty.org-20090614100110-u7l54gk0b6zbtj50 parent: psergey(a)askmonty.org-20090609211133-wfau2tgwo2vpgc5d committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Sun 2009-06-14 14:01:10 +0400 message: MWL#17: Table elimination - Do not show eliminated tables in the output of EXPLAIN EXTENDED modified: mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1 mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1 sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija ------------------------------------------------------------ revno: 2707.1.4 revision-id: psergey(a)askmonty.org-20090609211133-wfau2tgwo2vpgc5d parent: psergey(a)askmonty.org-20090608135546-ut1yrzbah4gdw6e6 committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Wed 2009-06-10 01:11:33 +0400 message: MWL#17: Table elimination - Make elimination work with aggregate functions. The problem was that aggregate functions reported all table bits in used_tables(), and that prevented table elimination. Fixed by making aggregate functions return more correct value from used_tables(). modified: mysql-test/r/ps_11bugs.result sp1f-ps_11bugs.result-20041012140047-4pktjlfeq27q6bxqfdsbcszr5nybv6zz mysql-test/r/subselect.result sp1f-subselect.result-20020512204640-zgegcsgavnfd7t7eyrf7ibuqomsw7uzo mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1 mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1 sql/item.h sp1f-item.h-19700101030959-rrkb43htudd62batmoteashkebcwykpa sql/item_sum.cc sp1f-item_sum.cc-19700101030959-4woo23bi3am2t2zvsddqbpxk7xbttdkm sql/item_sum.h sp1f-item_sum.h-19700101030959-ecgohlekwm355wxl5fv4zzq3alalbwyl ------------------------------------------------------------ revno: 2707.1.3 revision-id: psergey(a)askmonty.org-20090608135546-ut1yrzbah4gdw6e6 parent: psergey(a)askmonty.org-20090607182938-ycajee5ozg33b7c8 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-fix timestamp: Mon 2009-06-08 17:55:46 +0400 message: Fix valgrind failure: provide an implementation of strmov_overlapp() that really can handle overlapping. added: strings/strmov_overlapp.c strmov_overlapp.c-20090608135132-403c5p4dlnexqwxi-1 modified: include/m_string.h sp1f-m_string.h-19700101030959-rraattbvw5ffkokv4sixxf3s7brqqaga libmysql/Makefile.shared sp1f-makefile.shared-20000818182429-m3kdhxi23vorlqjct2y2hl3yw357jtxt strings/Makefile.am sp1f-makefile.am-19700101030959-jfitkanzc3r4h2otoyaaprgqn7muf4ux ------------------------------------------------------------ revno: 2707.1.2 revision-id: psergey(a)askmonty.org-20090607182938-ycajee5ozg33b7c8 parent: psergey(a)askmonty.org-20090603182330-ll3gc91iowhtgb23 parent: psergey(a)askmonty.org-20090607182403-6sfpvdr7nkkekcy9 committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1 timestamp: Sun 2009-06-07 22:29:38 +0400 message: Merge MWL#17: Table elimination modified: sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2705.2.2 revision-id: psergey(a)askmonty.org-20090607182403-6sfpvdr7nkkekcy9 parent: psergey(a)askmonty.org-20090603131045-c8jqhwlanli7eimv committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Sun 2009-06-07 22:24:03 +0400 message: MWL#17: Table Elimination - Fix trivial valgrind warning modified: sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb ------------------------------------------------------------ revno: 2707.1.1 revision-id: psergey(a)askmonty.org-20090603182330-ll3gc91iowhtgb23 parent: knielsen(a)knielsen-hq.org-20090602110359-n4q9gof38buucrny parent: psergey(a)askmonty.org-20090603131045-c8jqhwlanli7eimv committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1 timestamp: Wed 2009-06-03 22:23:30 +0400 message: Merge MWL#17 with maria/5.1 added: mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1 mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1 modified: sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija ------------------------------------------------------------ revno: 2705.2.1 revision-id: psergey(a)askmonty.org-20090603131045-c8jqhwlanli7eimv parent: knielsen(a)knielsen-hq.org-20090522175325-xpwm83ilnhqoqjz0 committer: Sergey Petrunia <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim timestamp: Wed 2009-06-03 17:10:45 +0400 message: MWL#17: Table elimination - First code. Elimination works for simple cases, passes the testsuite. - Known issues: = No elimination is done for aggregate functions. = EXPLAIN EXTENDED shows eliminated tables (I think it better not) = No benchmark yet = The code needs some polishing. added: mysql-test/r/table_elim.result table_elim.result-20090603125022-nge13y0ohk1g2tt2-1 mysql-test/t/table_elim.test table_elim.test-20090603125018-ka3vcfrm07bsldz8-1 modified: sql/sql_select.cc sp1f-sql_select.cc-19700101030959-egb7whpkh76zzvikycs5nsnuviu4fdlb sql/sql_select.h sp1f-sql_select.h-19700101030959-oqegfxr76xlgmrzd6qlevonoibfnwzoz sql/table.h sp1f-table.h-19700101030959-dv72bajftxj5fbdjuajquappanuv2ija Diff too large for email (3022 lines, the limit is 1000).

1 0

[Maria-developers] Rev 2734: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09

13 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r5/ ------------------------------------------------------------ revno: 2734 revision-id: psergey(a)askmonty.org-20090813204452-o8whzlbio19cgkyv parent: psergey(a)askmonty.org-20090813191053-g1xfeieoti4bqgbc committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Fri 2009-08-14 00:44:52 +0400 message: MWL#17: Table elimination - More function renames, added comments === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-13 19:10:53 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-13 20:44:52 +0000 @@ -93,11 +93,9 @@ /* - A field. - - Depends on table or equality - - Has expressions it participates as dependencies - - There is no counter, bound fields are in $list, not bound are not. + A table field. There is only one such object for any tblX.fieldY + - the field epends on its table and equalities + - expressions that use the field are its dependencies */ class Field_dep : public Func_dep { @@ -107,19 +105,23 @@ { type= Func_dep::FD_FIELD; } - /* Table we're from. It also has pointers to keys that we're part of */ - Table_dep *table; + + Table_dep *table; /* Table this field is from */ Field *field; + /* + Field_deps that belong to one table form a linked list. list members are + ordered by field_index + */ Field_dep *next_table_field; uint bitmap_offset; /* Offset of our part of the bitmap */ }; /* - A unique key. - - Depends on all its components - - Has its table as dependency + A Unique key. + - Unique key depends on all of its components + - Key's table is its dependency */ class Key_dep: public Func_dep { @@ -133,14 +135,15 @@ Table_dep *table; /* Table this key is from */ uint keyno; uint n_missing_keyparts; + /* Unique keys form a linked list, ordered by keyno */ Key_dep *next_table_key; }; /* - A table. - - Depends on any of its unique keys - - Has its fields and embedding outer join as dependency. + A table. + - table depends on any of its unique keys + - has its fields and embedding outer join as dependency. */ class Table_dep : public Func_dep { @@ -151,16 +154,16 @@ type= Func_dep::FD_TABLE; } TABLE *table; - Field_dep *fields; /* Fields that belong to this table */ - Key_dep *keys; /* Unique keys */ - Outer_join_dep *outer_join_dep; + Field_dep *fields; /* Ordered list of fields that belong to this table */ + Key_dep *keys; /* Ordered list of Unique keys in this table */ + Outer_join_dep *outer_join_dep; /* Innermost eliminable outer join we're in */ }; /* - An outer join nest. - - Depends on all tables inside it. - - (And that's it). + An outer join nest that is subject to elimination + - it depends on all tables inside it + - has its parent outer join as dependency */ class Outer_join_dep: public Func_dep { @@ -171,14 +174,27 @@ { type= Func_dep::FD_OUTER_JOIN; } + /* + Outer join we're representing. This can be a join nest or a one table that + is outer join'ed. + */ TABLE_LIST *table_list; + + /* + Tables within this outer join (and its descendants) that are not yet known + to be functionally dependent. + */ table_map missing_tables; + /* All tables within this outer join and its descendants */ table_map all_tables; + /* Parent eliminable outer join, if any */ Outer_join_dep *parent; }; -/* TODO need this? */ +/* + Table elimination context +*/ class Table_elimination { public: @@ -204,20 +220,22 @@ static -void build_funcdeps_for_cond(Table_elimination *te, Equality_dep **fdeps, - uint *and_level, Item *cond, - table_map usable_tables); +void build_eq_deps_for_cond(Table_elimination *te, Equality_dep **fdeps, + uint *and_level, Item *cond, + table_map usable_tables); static -void add_funcdep(Table_elimination *te, - Equality_dep **eq_dep, uint and_level, - Item_func *cond, Field *field, - bool eq_func, Item **value, - uint num_values, table_map usable_tables); +void add_eq_dep(Table_elimination *te, + Equality_dep **eq_dep, uint and_level, + Item_func *cond, Field *field, + bool eq_func, Item **value, + uint num_values, table_map usable_tables); static Equality_dep *merge_func_deps(Equality_dep *start, Equality_dep *new_fields, Equality_dep *end, uint and_level); -Field_dep *get_field_dep(Table_elimination *te, Field *field); +static Table_dep *get_table_dep(Table_elimination *te, TABLE *table); +static Field_dep *get_field_dep(Table_elimination *te, Field *field); + void eliminate_tables(JOIN *join); static void mark_as_eliminated(JOIN *join, TABLE_LIST *tbl); @@ -228,24 +246,25 @@ /*******************************************************************************************/ /* - Produce FUNC_DEP elements for the given item (i.e. condition) and add them - to fdeps array. + Produce Eq_dep elements for given condition. SYNOPSIS - build_funcdeps_for_cond() - fdeps INOUT Put created FUNC_DEP structures here - + build_eq_deps_for_cond() + te Table elimination context + fdeps INOUT Put produced equality conditions here + and_level INOUT AND-level (like in add_key_fields) + cond Condition to process + usable_tables Tables which fields we're interested in. That is, + Equality_dep represent "tbl.col=expr" and we'll + produce them only if tbl is in usable_tables. DESCRIPTION - a - - SEE ALSO - add_key_fields() - + This function is modeled after add_key_fields() */ + static -void build_funcdeps_for_cond(Table_elimination *te, - Equality_dep **fdeps, uint *and_level, Item *cond, - table_map usable_tables) +void build_eq_deps_for_cond(Table_elimination *te, Equality_dep **fdeps, + uint *and_level, Item *cond, + table_map usable_tables) { if (cond->type() == Item_func::COND_ITEM) { @@ -258,7 +277,7 @@ Item *item; while ((item=li++)) { - build_funcdeps_for_cond(te, fdeps, and_level, item, usable_tables); + build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables); } /* TODO: inject here a "if we have {t.col=const AND t.col=smth_else}, then @@ -270,13 +289,13 @@ else { (*and_level)++; - build_funcdeps_for_cond(te, fdeps, and_level, li++, usable_tables); + build_eq_deps_for_cond(te, fdeps, and_level, li++, usable_tables); Item *item; while ((item=li++)) { Equality_dep *start_key_fields= *fdeps; (*and_level)++; - build_funcdeps_for_cond(te, fdeps, and_level, item, usable_tables); + build_eq_deps_for_cond(te, fdeps, and_level, item, usable_tables); *fdeps= merge_func_deps(org_key_fields, start_key_fields, *fdeps, ++(*and_level)); } @@ -304,11 +323,11 @@ values--; DBUG_ASSERT(cond_func->functype() != Item_func::IN_FUNC || cond_func->argument_count() != 2); - add_funcdep(te, fdeps, *and_level, cond_func, - ((Item_field*)(cond_func->key_item()->real_item()))->field, - 0, values, - cond_func->argument_count()-1, - usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, + ((Item_field*)(cond_func->key_item()->real_item()))->field, + 0, values, + cond_func->argument_count()-1, + usable_tables); } if (cond_func->functype() == Item_func::BETWEEN) { @@ -321,8 +340,8 @@ !(cond_func->arguments()[i]->used_tables() & OUTER_REF_TABLE_BIT)) { field_item= (Item_field *) (cond_func->arguments()[i]->real_item()); - add_funcdep(te, fdeps, *and_level, cond_func, - field_item->field, 0, values, 1, usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, + field_item->field, 0, values, 1, usable_tables); } } } @@ -336,19 +355,19 @@ if (cond_func->arguments()[0]->real_item()->type() == Item::FIELD_ITEM && !(cond_func->arguments()[0]->used_tables() & OUTER_REF_TABLE_BIT)) { - add_funcdep(te, fdeps, *and_level, cond_func, - ((Item_field*)(cond_func->arguments()[0])->real_item())->field, - equal_func, - cond_func->arguments()+1, 1, usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, + ((Item_field*)(cond_func->arguments()[0])->real_item())->field, + equal_func, + cond_func->arguments()+1, 1, usable_tables); } if (cond_func->arguments()[1]->real_item()->type() == Item::FIELD_ITEM && cond_func->functype() != Item_func::LIKE_FUNC && !(cond_func->arguments()[1]->used_tables() & OUTER_REF_TABLE_BIT)) { - add_funcdep(te, fdeps, *and_level, cond_func, - ((Item_field*)(cond_func->arguments()[1])->real_item())->field, - equal_func, - cond_func->arguments(),1,usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, + ((Item_field*)(cond_func->arguments()[1])->real_item())->field, + equal_func, + cond_func->arguments(),1,usable_tables); } break; } @@ -360,10 +379,10 @@ Item *tmp=new Item_null; if (unlikely(!tmp)) // Should never be true return; - add_funcdep(te, fdeps, *and_level, cond_func, - ((Item_field*)(cond_func->arguments()[0])->real_item())->field, - cond_func->functype() == Item_func::ISNULL_FUNC, - &tmp, 1, usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, + ((Item_field*)(cond_func->arguments()[0])->real_item())->field, + cond_func->functype() == Item_func::ISNULL_FUNC, + &tmp, 1, usable_tables); } break; case Item_func::OPTIMIZE_EQUAL: @@ -380,8 +399,8 @@ */ while ((item= it++)) { - add_funcdep(te, fdeps, *and_level, cond_func, item->field, - TRUE, &const_item, 1, usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, item->field, + TRUE, &const_item, 1, usable_tables); } } else @@ -400,8 +419,8 @@ { if (!field->eq(item->field)) { - add_funcdep(te, fdeps, *and_level, cond_func, field/*item*/, - TRUE, (Item **) &item, 1, usable_tables); + add_eq_dep(te, fdeps, *and_level, cond_func, field, + TRUE, (Item **) &item, 1, usable_tables); } } it.rewind(); @@ -411,15 +430,19 @@ } } + /* - Perform an OR operation on two (adjacent) FUNC_DEP arrays. + Perform an OR operation on two (adjacent) Equality_dep arrays. SYNOPSIS merge_func_deps() + start Start of left OR-part + new_fields Start of right OR-part + end End of right OR-part + and_level AND-level. DESCRIPTION - - This function is invoked for two adjacent arrays of FUNC_DEP elements: + This function is invoked for two adjacent arrays of Equality_dep elements: $LEFT_PART $RIGHT_PART +-----------------------+-----------------------+ @@ -527,17 +550,18 @@ /* - Add a funcdep for a given equality. + Add an Equality_dep element for a given predicate, if applicable + + DESCRIPTION + This function is modeled after add_key_field(). */ static -void add_funcdep(Table_elimination *te, - Equality_dep **eq_dep, uint and_level, - Item_func *cond, Field *field, - bool eq_func, Item **value, - uint num_values, table_map usable_tables) +void add_eq_dep(Table_elimination *te, Equality_dep **eq_dep, + uint and_level, Item_func *cond, Field *field, + bool eq_func, Item **value, uint num_values, + table_map usable_tables) { - // Field *field= item_field->field; if (!(field->table->map & usable_tables)) return; @@ -606,7 +630,11 @@ } -Table_dep *get_table_dep(Table_elimination *te, TABLE *table) +/* + Get a Table_dep object for the given table, creating it if necessary. +*/ + +static Table_dep *get_table_dep(Table_elimination *te, TABLE *table) { Table_dep *tbl_dep= new Table_dep(table); Key_dep **key_list= &(tbl_dep->keys); @@ -625,19 +653,21 @@ return te->table_deps[table->tablenr] = tbl_dep; } + /* - Given a field, get its dependency element: if it already exists, find it, - otherwise create it. + Get a Field_dep object for the given field, creating it if necessary */ -Field_dep *get_field_dep(Table_elimination *te, Field *field) +static Field_dep *get_field_dep(Table_elimination *te, Field *field) { TABLE *table= field->table; Table_dep *tbl_dep; + /* First, get the table*/ if (!(tbl_dep= te->table_deps[table->tablenr])) tbl_dep= get_table_dep(te, table); - + + /* Try finding the field in field list */ Field_dep **pfield= &(tbl_dep->fields); while (*pfield && (*pfield)->field->field_index < field->field_index) { @@ -646,20 +676,34 @@ if (*pfield && (*pfield)->field->field_index == field->field_index) return *pfield; + /* Create the field and insert it in the list */ Field_dep *new_field= new Field_dep(tbl_dep, field); - new_field->next_table_field= *pfield; *pfield= new_field; + return new_field; } +/* + Create an Outer_join_dep object for the given outer join + + DESCRIPTION + Outer_join_dep objects for children (or further descendants) are always + created before the parents. +*/ + +static Outer_join_dep *get_outer_join_dep(Table_elimination *te, TABLE_LIST *outer_join, table_map deps_map) { Outer_join_dep *oj_dep; oj_dep= new Outer_join_dep(outer_join, deps_map); - + + /* + Collect a bitmap fo tables that we depend on, and also set parent pointer + for descendant outer join elements. + */ Table_map_iterator it(deps_map); int idx; while ((idx= it.next_bit()) != Table_map_iterator::BITMAP_END) @@ -667,6 +711,11 @@ Table_dep *table_dep; if (!(table_dep= te->table_deps[idx])) { + /* + We get here only when ON expression had no references to inner tables + and Table_map objects weren't created for them. This is a rare/ + unimportant case so it's ok to do not too efficient searches. + */ TABLE *table= NULL; for (TABLE_LIST *tlist= te->join->select_lex->leaf_tables; tlist; tlist=tlist->next_leaf) @@ -680,7 +729,13 @@ DBUG_ASSERT(table); table_dep= get_table_dep(te, table); } - + + /* + Walk from the table up to its embedding outer joins. The goal is to + find the least embedded outer join nest and set its parent pointer to + point to the newly created Outer_join_dep. + to set the pointer of its near + */ if (!table_dep->outer_join_dep) table_dep->outer_join_dep= oj_dep; else @@ -690,43 +745,35 @@ oj= oj->parent; oj->parent=oj_dep; } - } return oj_dep; } /* - Perform table elimination in a given join list + Build functional dependency graph for elements of given join list SYNOPSIS collect_funcdeps_for_join_list() - te Table elimination context. - join_list Join list to work on - its_outer_join TRUE <=> the join_list is an inner side of an - outer join - FALSE <=> otherwise (this is top-level join - list, simplify_joins flattens out all - other kinds of join lists) - - tables_in_list Bitmap of tables embedded in the join_list. - tables_used_elsewhere Bitmap of tables that are referred to from - somewhere outside of the join list (e.g. - select list, HAVING, etc). + te Table elimination context. + join_list Join list to work on + build_eq_deps TRUE <=> build Equality_dep elements for all + members of the join list, even if they cannot + be individually eliminated + tables_used_elsewhere Bitmap of tables that are referred to from + somewhere outside of this join list (e.g. + select list, HAVING, ON expressions of parent + joins, etc). + eliminable_tables INOUT Tables that can potentially be eliminated + (needed so we know for which tables to build + dependencies for) + eq_dep INOUT End of array of equality dependencies. DESCRIPTION - Perform table elimination for a join list. - Try eliminating children nests first. - The "all tables in join nest can produce only one matching record - combination" property checking is modeled after constant table detection, - plus we reuse info attempts to eliminate child join nests. - - RETURN - Number of children left after elimination. 0 means everything was - eliminated. + . */ -static uint +static void collect_funcdeps_for_join_list(Table_elimination *te, List<TABLE_LIST> *join_list, bool build_eq_deps, @@ -771,7 +818,7 @@ { // build comp_cond from ON expression uint and_level=0; - build_funcdeps_for_cond(te, eq_dep, &and_level, tbl->on_expr, + build_eq_deps_for_cond(te, eq_dep, &and_level, tbl->on_expr, *eliminable_tables); } @@ -781,19 +828,13 @@ tables_used_on_left |= tbl->on_expr->used_tables(); } } - return 0; + return; } + /* - Analyze exising FUNC_DEP array and add elements for tables and uniq keys - - SYNOPSIS - - DESCRIPTION - Add FUNC_DEP elements - - RETURN - . + This is used to analyse expressions in "tbl.col=expr" dependencies so + that we can figure out which fields the expression depends on. */ class Field_dependency_setter : public Field_enumerator @@ -819,20 +860,41 @@ return; } } - /* We didn't find the field. Bump the dependency anyway */ + /* + We got here if didn't find this field. It's not a part of + a unique key, and/or there is no field=expr element for it. + Bump the dependency anyway, this will signal that this dependency + cannot be satisfied. + */ te->equality_deps[expr_offset].unknown_args++; } } + Table_elimination *te; - uint expr_offset; /* Offset of the expression we're processing */ + /* Offset of the expression we're processing in the dependency bitmap */ + uint expr_offset; }; +/* + Setup equality dependencies + + SYNOPSIS + setup_equality_deps() + te Table elimination context + bound_deps_list OUT Start of linked list of elements that were found to + be bound (caller will use this to see if that + allows to declare further elements bound) +*/ + static bool setup_equality_deps(Table_elimination *te, Func_dep **bound_deps_list) { DBUG_ENTER("setup_equality_deps"); + /* + Count Field_dep objects and assign each of them a unique bitmap_offset. + */ uint offset= 0; for (Table_dep **tbl_dep=te->table_deps; tbl_dep < te->table_deps + MAX_TABLES; @@ -859,7 +921,10 @@ bitmap_clear_all(&te->expr_deps); /* - Walk through all field=expr elements and collect all fields. + Analyze all "field=expr" dependencies, and have te->expr_deps encode + dependencies of expressions from fields. + + Also collect a linked list of equalities that are bound. */ Func_dep *bound_dep= NULL; Field_dependency_setter deps_setter(te);

1 0

[Maria-developers] Rev 2733: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09

13 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r5/ ------------------------------------------------------------ revno: 2733 revision-id: psergey(a)askmonty.org-20090813191053-g1xfeieoti4bqgbc parent: psergey(a)askmonty.org-20090813093613-hy7tdlsgdy83xszq committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 23:10:53 +0400 message: MWL#17: Table elimination - Better comments === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-13 09:36:13 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-13 19:10:53 +0000 @@ -20,19 +20,16 @@ OVERVIEW The module has one entry point - eliminate_tables() function, which one - needs to call (once) sometime after update_ref_and_keys() but before the - join optimization. + needs to call (once) at some point before the join optimization. eliminate_tables() operates over the JOIN structures. Logically, it removes the right sides of outer join nests. Physically, it changes the following members: * Eliminated tables are marked as constant and moved to the front of the join order. + * In addition to this, they are recorded in JOIN::eliminated_tables bitmap. - * All join nests have their NESTED_JOIN::n_tables updated to discount - the eliminated tables - * Items that became disused because they were in the ON expression of an eliminated outer join are notified by means of the Item tree walk which calls Item::mark_as_eliminated_processor for every item @@ -40,26 +37,13 @@ Item_subselect with its Item_subselect::eliminated flag which is used by EXPLAIN code to check if the subquery should be shown in EXPLAIN. - Table elimination is redone on every PS re-execution. (TODO reasons?) + Table elimination is redone on every PS re-execution. */ + /* - A structure that represents a functional dependency of something over - something else. This can be one of: - - 1. A "tbl.field = expr" equality. The field depends on the expression. - - 2. An Item_equal(...) multi-equality. Each participating field depends on - every other participating field. (TODO???) - - 3. A UNIQUE_KEY(field1, field2, fieldN). The key depends on the fields that - it is composed of. - - 4. A table (which is within an outer join nest). Table depends on a unique - key (value of a unique key identifies a table record) - - 5. An outer join nest. It depends on all tables it contains. - + An abstract structure that represents some entity that's being dependent on + some other entity. */ class Func_dep : public Sql_alloc @@ -73,9 +57,14 @@ FD_UNIQUE_KEY, FD_TABLE, FD_OUTER_JOIN - } type; - Func_dep *next; - bool bound; + } type; /* Type of the object */ + + /* + Used to make a linked list of elements that became bound and thus can + make elements that depend on them bound, too. + */ + Func_dep *next; + bool bound; /* TRUE<=> The entity is considered bound */ Func_dep() : next(NULL), bound(FALSE) {} }; @@ -84,10 +73,10 @@ class Table_dep; class Outer_join_dep; + /* - An equality - - Depends on multiple fields (those in its expression), unknown_args is a - counter of unsatisfied dependencies. + A "tbl.column= expr" equality dependency. tbl.column depends on fields + used in expr. */ class Equality_dep : public Func_dep { @@ -95,8 +84,11 @@ Field_dep *field; Item *val; - uint level; /* Used during condition analysis only */ - uint unknown_args; /* Number of yet unknown arguments */ + /* Used during condition analysis only, similar to KEYUSE::level */ + uint level; + + /* Number of fields referenced from *val that are not yet 'bound' */ + uint unknown_args; }; @@ -139,7 +131,7 @@ type= Func_dep::FD_UNIQUE_KEY; } Table_dep *table; /* Table this key is from */ - uint keyno; // TODO do we care about this + uint keyno; uint n_missing_keyparts; Key_dep *next_table_key; }; === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2009-08-13 09:24:02 +0000 +++ b/sql/sql_select.cc 2009-08-13 19:10:53 +0000 @@ -114,7 +114,7 @@ COND *conds, bool top); static bool check_interleaving_with_nj(JOIN_TAB *next); static void restore_prev_nj_state(JOIN_TAB *last); -static void reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list); +static uint reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list); static uint build_bitmap_for_nested_joins(List<TABLE_LIST> *join_list, uint first_unused); @@ -8791,23 +8791,26 @@ tables which will be ignored. */ -static void reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list) +static uint reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list) { List_iterator<TABLE_LIST> li(*join_list); TABLE_LIST *table; DBUG_ENTER("reset_nj_counters"); + uint n=0; while ((table= li++)) { NESTED_JOIN *nested_join; if ((nested_join= table->nested_join)) { nested_join->counter= 0; - nested_join->n_tables= my_count_bits(nested_join->used_tables & - ~join->eliminated_tables); - reset_nj_counters(join, &nested_join->join_list); + //nested_join->n_tables= my_count_bits(nested_join->used_tables & + // ~join->eliminated_tables); + nested_join->n_tables= reset_nj_counters(join, &nested_join->join_list); } + if (table->table && (table->table->map & ~join->eliminated_tables)) + n++; } - DBUG_VOID_RETURN; + DBUG_RETURN(n); }

1 0

[Maria-developers] Rev 2732: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09

13 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r5/ ------------------------------------------------------------ revno: 2732 revision-id: psergey(a)askmonty.org-20090813093613-hy7tdlsgdy83xszq parent: psergey(a)askmonty.org-20090813092402-jlqucf6nultxlv4b committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 13:36:13 +0400 message: MWL#17: Table elimination Fixes after post-review fixes: - Don't search for tables in JOIN_TAB array. it's not initialized yet. use select_lex->leaf_tables instead. === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-13 00:01:43 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-13 09:36:13 +0000 @@ -676,16 +676,12 @@ if (!(table_dep= te->table_deps[idx])) { TABLE *table= NULL; - /* - Locate and create the table. The search isnt very efficient but - typically we won't get here as we process the ON expression first - and that will create the Table_dep - */ - for (uint i= 0; i < te->join->tables; i++) + for (TABLE_LIST *tlist= te->join->select_lex->leaf_tables; tlist; + tlist=tlist->next_leaf) { - if (te->join->join_tab[i].table->tablenr == (uint)idx) + if (tlist->table->tablenr == (uint)idx) { - table= te->join->join_tab[i].table; + table=tlist->table; break; } }

1 0

[Maria-developers] Rev 2731: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09

13 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r5/ ------------------------------------------------------------ revno: 2731 revision-id: psergey(a)askmonty.org-20090813092402-jlqucf6nultxlv4b parent: psergey(a)askmonty.org-20090813000143-dukzk352hjywidk7 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 13:24:02 +0400 message: MWL#17: Table elimination - Post-postreview changes fix: Do set NESTED_JOIN::n_tables to number of tables left after elimination. === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2009-08-12 23:43:02 +0000 +++ b/sql/sql_select.cc 2009-08-13 09:24:02 +0000 @@ -114,7 +114,7 @@ COND *conds, bool top); static bool check_interleaving_with_nj(JOIN_TAB *next); static void restore_prev_nj_state(JOIN_TAB *last); -static void reset_nj_counters(List<TABLE_LIST> *join_list); +static void reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list); static uint build_bitmap_for_nested_joins(List<TABLE_LIST> *join_list, uint first_unused); @@ -1011,7 +1011,7 @@ DBUG_RETURN(1); } - reset_nj_counters(join_list); + reset_nj_counters(this, join_list); make_outerjoin_info(this); /* @@ -4625,7 +4625,7 @@ DBUG_ENTER("choose_plan"); join->cur_embedding_map= 0; - reset_nj_counters(join->join_list); + reset_nj_counters(join, join->join_list); /* if (SELECT_STRAIGHT_JOIN option is set) reorder tables so dependent tables come after tables they depend @@ -8791,7 +8791,7 @@ tables which will be ignored. */ -static void reset_nj_counters(List<TABLE_LIST> *join_list) +static void reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list) { List_iterator<TABLE_LIST> li(*join_list); TABLE_LIST *table; @@ -8802,7 +8802,9 @@ if ((nested_join= table->nested_join)) { nested_join->counter= 0; - reset_nj_counters(&nested_join->join_list); + nested_join->n_tables= my_count_bits(nested_join->used_tables & + ~join->eliminated_tables); + reset_nj_counters(join, &nested_join->join_list); } } DBUG_VOID_RETURN;

1 0

[Maria-developers] Rev 2730: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 13 Aug '09

13 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r5/ ------------------------------------------------------------ revno: 2730 revision-id: psergey(a)askmonty.org-20090813000143-dukzk352hjywidk7 parent: psergey(a)askmonty.org-20090812234302-10es7qmf0m09ahbq committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 04:01:43 +0400 message: MWL#17: Table elimination - When making inferences "field is bound" -> "key is bound", do check that the field is part of the key === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-12 23:43:02 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-13 00:01:43 +0000 @@ -1043,7 +1043,8 @@ DBUG_PRINT("info", ("key %s.%s is now bound", key_dep->table->table->alias, key_dep->table->table->key_info[key_dep->keyno].name)); - if (!key_dep->bound) + if (field_dep->field->part_of_key.is_set(key_dep->keyno) && + !key_dep->bound) { if (!--key_dep->n_missing_keyparts) {

1 0

[Maria-developers] Rev 2729: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 12 Aug '09

12 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r5/ ------------------------------------------------------------ revno: 2729 revision-id: psergey(a)askmonty.org-20090812234302-10es7qmf0m09ahbq parent: psergey(a)askmonty.org-20090812223421-w4xyzj7azqgo83ps committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 03:43:02 +0400 message: MWL#17: Table elimination - Continue addressing review feedback: remove "unusable KEYUSEs" extension as it is no longer needed. === modified file 'sql/item.h' --- a/sql/item.h 2009-08-12 22:34:21 +0000 +++ b/sql/item.h 2009-08-12 23:43:02 +0000 @@ -1017,18 +1017,6 @@ bool eq_by_collation(Item *item, bool binary_cmp, CHARSET_INFO *cs); }; -#if 0 -typedef struct -{ - TABLE *table; /* Table of interest */ - uint keyno; /* Index of interest */ - uint forbidden_part; /* key part which one is not allowed to refer to */ - /* [Set by processor] used tables, besides the table of interest */ - table_map used_tables; - /* [Set by processor] Parts of index of interest that expression refers to */ - uint needed_key_parts; -} Field_processor_info; -#endif /* Data for Item::check_column_usage_processor */ class Field_enumerator === modified file 'sql/opt_table_elimination.cc' --- a/sql/opt_table_elimination.cc 2009-08-12 22:34:21 +0000 +++ b/sql/opt_table_elimination.cc 2009-08-12 23:43:02 +0000 @@ -1119,7 +1119,6 @@ case Func_dep::FD_OUTER_JOIN: { Outer_join_dep *outer_join_dep= (Outer_join_dep*)bound_dep; - /* TODO what do here? Stop if eliminated the top-level? */ mark_as_eliminated(te.join, outer_join_dep->table_list); Outer_join_dep *parent= outer_join_dep->parent; if (parent && @@ -1236,38 +1235,6 @@ #endif -/***********************************************************************************************/ - -#if 0 -static void dbug_print_fdep(FUNC_DEP *fd) -{ - switch (fd->type) { - case FUNC_DEP::FD_OUTER_JOIN: - { - fprintf(DBUG_FILE, "outer_join("); - if (fd->table_list->nested_join) - { - bool first= TRUE; - List_iterator<TABLE_LIST> it(fd->table_list->nested_join->join_list); - TABLE_LIST *tbl; - while ((tbl= it++)) - { - fprintf(DBUG_FILE, "%s%s", first?"":" ", - tbl->table? tbl->table->alias : "..."); - first= FALSE; - } - fprintf(DBUG_FILE, ")"); - } - else - fprintf(DBUG_FILE, "%s", fd->table_list->table->alias); - fprintf(DBUG_FILE, ")"); - break; - } - } -} - -#endif - /** @} (end of group Table_Elimination) */ === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2009-06-30 13:11:00 +0000 +++ b/sql/sql_select.cc 2009-08-12 23:43:02 +0000 @@ -2474,7 +2474,6 @@ DBUG_RETURN(HA_POS_ERROR); /* This shouldn't happend */ } - /* This structure is used to collect info on potentially sargable predicates in order to check whether they become sargable after @@ -2762,16 +2761,14 @@ { start_keyuse=keyuse; key=keyuse->key; - if (keyuse->type == KEYUSE_USABLE) - s->keys.set_bit(key); // QQ: remove this ? + s->keys.set_bit(key); // QQ: remove this ? refs=0; const_ref.clear_all(); eq_part.clear_all(); do { - if (keyuse->type == KEYUSE_USABLE && - keyuse->val->type() != Item::NULL_ITEM && !keyuse->optimize) + if (keyuse->val->type() != Item::NULL_ITEM && !keyuse->optimize) { if (!((~found_const_table_map) & keyuse->used_tables)) const_ref.set_bit(keyuse->keypart); @@ -2971,9 +2968,11 @@ */ bool null_rejecting; bool *cond_guard; /* See KEYUSE::cond_guard */ - enum keyuse_type type; /* See KEYUSE::type */ } KEY_FIELD; +/* Values in optimize */ +#define KEY_OPTIMIZE_EXISTS 1 +#define KEY_OPTIMIZE_REF_OR_NULL 2 /** Merge new key definitions to old ones, remove those not used in both. @@ -3064,18 +3063,13 @@ KEY_OPTIMIZE_REF_OR_NULL)); old->null_rejecting= (old->null_rejecting && new_fields->null_rejecting); - /* - The conditions are the same, hence their usabilities should - be, too (TODO: shouldn't that apply to the above - null_rejecting and optimize attributes?) - */ - DBUG_ASSERT(old->type == new_fields->type); } } else if (old->eq_func && new_fields->eq_func && old->val->eq_by_collation(new_fields->val, old->field->binary(), old->field->charset())) + { old->level= and_level; old->optimize= ((old->optimize & new_fields->optimize & @@ -3084,15 +3078,10 @@ KEY_OPTIMIZE_REF_OR_NULL)); old->null_rejecting= (old->null_rejecting && new_fields->null_rejecting); - // "t.key_col=const" predicates are always usable - DBUG_ASSERT(old->type == KEYUSE_USABLE && - new_fields->type == KEYUSE_USABLE); } else if (old->eq_func && new_fields->eq_func && - ((new_fields->type == KEYUSE_USABLE && - old->val->const_item() && old->val->is_null()) || - ((old->type == KEYUSE_USABLE && new_fields->val->is_null())))) - /* TODO ^ why is the above asymmetric, why const_item()? */ + ((old->val->const_item() && old->val->is_null()) || + new_fields->val->is_null())) { /* field = expression OR field IS NULL */ old->level= and_level; @@ -3163,7 +3152,6 @@ table_map usable_tables, SARGABLE_PARAM **sargables) { uint exists_optimize= 0; - bool optimizable=0; if (!(field->flags & PART_KEY_FLAG)) { // Don't remove column IS NULL on a LEFT JOIN table @@ -3176,12 +3164,15 @@ else { table_map used_tables=0; + bool optimizable=0; for (uint i=0; i<num_values; i++) { used_tables|=(value[i])->used_tables(); if (!((value[i])->used_tables() & (field->table->map | RAND_TABLE_BIT))) optimizable=1; } + if (!optimizable) + return; if (!(usable_tables & field->table->map)) { if (!eq_func || (*value)->type() != Item::NULL_ITEM || @@ -3194,8 +3185,7 @@ JOIN_TAB *stat=field->table->reginfo.join_tab; key_map possible_keys=field->key_start; possible_keys.intersect(field->table->keys_in_use_for_query); - if (optimizable) - stat[0].keys.merge(possible_keys); // Add possible keys + stat[0].keys.merge(possible_keys); // Add possible keys /* Save the following cases: @@ -3288,7 +3278,6 @@ (*key_fields)->val= *value; (*key_fields)->level= and_level; (*key_fields)->optimize= exists_optimize; - (*key_fields)->type= optimizable? KEYUSE_USABLE : KEYUSE_UNKNOWN; /* If the condition has form "tbl.keypart = othertbl.field" and othertbl.field can be NULL, there will be no matches if othertbl.field @@ -3600,7 +3589,6 @@ keyuse.optimize= key_field->optimize & KEY_OPTIMIZE_REF_OR_NULL; keyuse.null_rejecting= key_field->null_rejecting; keyuse.cond_guard= key_field->cond_guard; - keyuse.type= key_field->type; VOID(insert_dynamic(keyuse_array,(uchar*) &keyuse)); } } @@ -3609,6 +3597,7 @@ } +#define FT_KEYPART (MAX_REF_PARTS+10) static void add_ft_keys(DYNAMIC_ARRAY *keyuse_array, @@ -3667,7 +3656,6 @@ keyuse.used_tables=cond_func->key_item()->used_tables(); keyuse.optimize= 0; keyuse.keypart_map= 0; - keyuse.type= KEYUSE_USABLE; VOID(insert_dynamic(keyuse_array,(uchar*) &keyuse)); } @@ -3682,13 +3670,6 @@ return (int) (a->key - b->key); if (a->keypart != b->keypart) return (int) (a->keypart - b->keypart); - - // Usable ones go before the unusable - int a_ok= test(a->type == KEYUSE_USABLE); - int b_ok= test(b->type == KEYUSE_USABLE); - if (a_ok != b_ok) - return a_ok? -1 : 1; - // Place const values before other ones if ((res= test((a->used_tables & ~OUTER_REF_TABLE_BIT)) - test((b->used_tables & ~OUTER_REF_TABLE_BIT)))) @@ -3899,8 +3880,7 @@ found_eq_constant=0; for (i=0 ; i < keyuse->elements-1 ; i++,use++) { - if (use->type == KEYUSE_USABLE && !use->used_tables && - use->optimize != KEY_OPTIMIZE_REF_OR_NULL) + if (!use->used_tables && use->optimize != KEY_OPTIMIZE_REF_OR_NULL) use->table->const_key_parts[use->key]|= use->keypart_map; if (use->keypart != FT_KEYPART) { @@ -3924,8 +3904,7 @@ /* Save ptr to first use */ if (!use->table->reginfo.join_tab->keyuse) use->table->reginfo.join_tab->keyuse=save_pos; - if (use->type == KEYUSE_USABLE) - use->table->reginfo.join_tab->checked_keys.set_bit(use->key); + use->table->reginfo.join_tab->checked_keys.set_bit(use->key); save_pos++; } i=(uint) (save_pos-(KEYUSE*) keyuse->buffer); @@ -3955,7 +3934,7 @@ To avoid bad matches, we don't make ref_table_rows less than 100. */ keyuse->ref_table_rows= ~(ha_rows) 0; // If no ref - if (keyuse->type == KEYUSE_USABLE && keyuse->used_tables & + if (keyuse->used_tables & (map= (keyuse->used_tables & ~join->const_table_map & ~OUTER_REF_TABLE_BIT))) { @@ -4147,8 +4126,7 @@ if 1. expression doesn't refer to forward tables 2. we won't get two ref-or-null's */ - if (keyuse->type == KEYUSE_USABLE && - !(remaining_tables & keyuse->used_tables) && + if (!(remaining_tables & keyuse->used_tables) && !(ref_or_null_part && (keyuse->optimize & KEY_OPTIMIZE_REF_OR_NULL))) { @@ -5602,8 +5580,7 @@ */ do { - if (!(~used_tables & keyuse->used_tables) && - keyuse->type == KEYUSE_USABLE) + if (!(~used_tables & keyuse->used_tables)) { if (keyparts == keyuse->keypart && !(found_part_ref_or_null & keyuse->optimize)) @@ -5653,11 +5630,9 @@ uint i; for (i=0 ; i < keyparts ; keyuse++,i++) { - while (keyuse->keypart != i || ((~used_tables) & keyuse->used_tables) || - !(keyuse->type == KEYUSE_USABLE)) - { + while (keyuse->keypart != i || + ((~used_tables) & keyuse->used_tables)) keyuse++; /* Skip other parts */ - } uint maybe_null= test(keyinfo->key_part[i].null_bit); j->ref.items[i]=keyuse->val; // Save for cond removal === modified file 'sql/sql_select.h' --- a/sql/sql_select.h 2009-06-30 13:11:00 +0000 +++ b/sql/sql_select.h 2009-08-12 23:43:02 +0000 @@ -28,45 +28,6 @@ #include "procedure.h" #include <myisam.h> -#define FT_KEYPART (MAX_REF_PARTS+10) -/* Values in optimize */ -#define KEY_OPTIMIZE_EXISTS 1 -#define KEY_OPTIMIZE_REF_OR_NULL 2 - -/* KEYUSE element types */ -enum keyuse_type -{ - /* - val refers to the same table, this is either KEYUSE_BIND or KEYUSE_NO_BIND - type, we didn't determine which one yet. - */ - KEYUSE_UNKNOWN= 0, - /* - 'regular' keyuse, i.e. it represents one of the following - * t.keyXpartY = func(constants, other-tables) - * t.keyXpartY IS NULL - * t.keyXpartY = func(constants, other-tables) OR t.keyXpartY IS NULL - and can be used to construct ref acces - */ - KEYUSE_USABLE, - /* - The keyuse represents a condition in form: - - t.uniq_keyXpartY = func(other parts of uniq_keyX) - - This can't be used to construct uniq_keyX but we could use it to determine - that the table will produce at most one match. - */ - KEYUSE_BIND, - /* - Keyuse that's not usable for ref access and doesn't meet the criteria of - KEYUSE_BIND. Examples: - t.keyXpartY = func(t.keyXpartY) - t.keyXpartY = func(column of t that's not covered by keyX) - */ - KEYUSE_NO_BIND -}; - typedef struct keyuse_t { TABLE *table; Item *val; /**< or value if no field */ @@ -90,15 +51,6 @@ NULL - Otherwise (the source equality can't be turned off) */ bool *cond_guard; - /* - 1 <=> This keyuse can be used to construct key access. - 0 <=> Otherwise. Currently unusable KEYUSEs represent equalities - where one table column refers to another one, like this: - t.keyXpartA=func(t.keyXpartB) - This equality cannot be used for index access but is useful - for table elimination. - */ - enum keyuse_type type; } KEYUSE; class store_key; @@ -258,7 +210,7 @@ JOIN *join; /** Bitmap of nested joins this table is part of */ nested_join_map embedding_map; - + void cleanup(); inline bool is_using_loose_index_scan() {

1 0

[Maria-developers] Rev 2728: MWL#17: Table elimination in file:///home/psergey/dev/maria-5.1-table-elim-r5/
by Sergey Petrunya 12 Aug '09

12 Aug '09

At file:///home/psergey/dev/maria-5.1-table-elim-r5/ ------------------------------------------------------------ revno: 2728 revision-id: psergey(a)askmonty.org-20090812223421-w4xyzj7azqgo83ps parent: psergey(a)askmonty.org-20090708171038-9nyc3hcg1o7h8635 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-table-elim-r5 timestamp: Thu 2009-08-13 02:34:21 +0400 message: MWL#17: Table elimination Address review feedback: - Change from Wave-based approach (a-la const table detection) to building and walking functional dependency graph. - Change from piggy-backing on ref-access code and KEYUSE structures to using our own expression analyzer. Diff too large for email (1602 lines, the limit is 1000).

1 0

[Maria-developers] Buildbot and compiler warnings
by Kristian Nielsen 12 Aug '09

12 Aug '09

I have now implemented and installed in our Buildbot enhanced facilities for dealing with compiler warnings. We already have a file support-files/compiler_warnings.supp, which I think is used by PushBuild @ MySQL. The new facilities in our Buildbot uses the same file to suppress certain warnings that for some reason cannot be removed or are not desirable to remove. See for example: https://askmonty.org/buildbot/waterfall?branch=5.1 https://askmonty.org/buildbot/builders/hardy-amd64-valgrind/builds/113/step… So there are still a few warnings that need to be eliminated, patches welcome :-) Note that old builds from earlier than today still have the old log files, without these new warning facilities. Would be great to get us to compile without any warnings. The Drizzle people already compile with -pedantic -Werror, so we are trailing behind there! - Kristian.

4 4

[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply＠askmonty.org 11 Aug '09

11 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: options for CREATE TABLE CREATION DATE..: Tue, 11 Aug 2009, 17:02 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: Sanja COPIES TO......: Monty CATEGORY.......: Server-BackLog TASK ID........: 43 (http://askmonty.org/worklog/?tid=43) VERSION........: Server-5.1 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 32 (hours remain) ORIG. ESTIMATE.: 32 PROGRESS NOTES: -=-=(Guest - Tue, 11 Aug 2009, 19:57)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.31856 2009-08-11 19:57:38.000000000 +0300 +++ /tmp/wklog.43.new.31856 2009-08-11 19:57:38.000000000 +0300 @@ -5,8 +5,43 @@ key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; -Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +Exclusion should be made for old table and key options where '=' was not obligatory. +Old key options: +KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num +WITH PARSER <name> -> PARSER=name + +Old table options: +ENGINE name -> ENGINE=name +TYPE name -> TYPE=name +MAX_ROWS num -> MAX_ROWS=num +MIX_ROWS num -> MIX_ROWS=num +AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num +PASSWORD string -> PASSWORD=string +COMMENT string -> COMMENT=string +AUTO_INCREMENT num -> AUTO_INCREMENT=num +PACK_KEYS num/default -> PACK_KEYS=num/default +CHECKSUM num -> CHECKSUM=num +TABLE_CHECKSUM num -> TABLE_CHECKSUM=num +PAGE_CHECKSUM num -> PAGE_CHECKSUM=num +DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num +ROW_FORMAT name -> ROW_FORMAT=name +INSERT_METHOD name -> INSERT_METHOD=name +KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num +TRANSACTIONAL num -> TRANSACTIONAL=num + +Table options which will be left hardcoded +UNION +default charset +default collation +DATA DIRECTORY +TABLESPACE +STORAGE + For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. + + + + -=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300 +++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300 @@ -1 +1,12 @@ +Table definition ca looks like following +CREATE TABLE table + (field int ... field_opt1=fval1 field_opt2=fval2, + key key1(field) key_opt1=kval1 key_opt2=kval2) + table_option1=tval1, table_option2=tval2; + +Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +'=' was not obligatory. + +For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can +be separated from them by '=' sign. DESCRIPTION: Add ability to create table with additional option which can be passed to engine. Also make current options such as TRANSACTIONAL working via this mechanism. HIGH-LEVEL SPECIFICATION: Table definition ca looks like following CREATE TABLE table (field int ... field_opt1=fval1 field_opt2=fval2, key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; Exclusion should be made for old table and key options where '=' was not obligatory. Old key options: KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num WITH PARSER <name> -> PARSER=name Old table options: ENGINE name -> ENGINE=name TYPE name -> TYPE=name MAX_ROWS num -> MAX_ROWS=num MIX_ROWS num -> MIX_ROWS=num AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num PASSWORD string -> PASSWORD=string COMMENT string -> COMMENT=string AUTO_INCREMENT num -> AUTO_INCREMENT=num PACK_KEYS num/default -> PACK_KEYS=num/default CHECKSUM num -> CHECKSUM=num TABLE_CHECKSUM num -> TABLE_CHECKSUM=num PAGE_CHECKSUM num -> PAGE_CHECKSUM=num DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num ROW_FORMAT name -> ROW_FORMAT=name INSERT_METHOD name -> INSERT_METHOD=name KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num TRANSACTIONAL num -> TRANSACTIONAL=num Table options which will be left hardcoded UNION default charset default collation DATA DIRECTORY TABLESPACE STORAGE For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply＠askmonty.org 11 Aug '09

11 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: options for CREATE TABLE CREATION DATE..: Tue, 11 Aug 2009, 17:02 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: Sanja COPIES TO......: Monty CATEGORY.......: Server-BackLog TASK ID........: 43 (http://askmonty.org/worklog/?tid=43) VERSION........: Server-5.1 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 32 (hours remain) ORIG. ESTIMATE.: 32 PROGRESS NOTES: -=-=(Guest - Tue, 11 Aug 2009, 19:57)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.31856 2009-08-11 19:57:38.000000000 +0300 +++ /tmp/wklog.43.new.31856 2009-08-11 19:57:38.000000000 +0300 @@ -5,8 +5,43 @@ key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; -Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +Exclusion should be made for old table and key options where '=' was not obligatory. +Old key options: +KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num +WITH PARSER <name> -> PARSER=name + +Old table options: +ENGINE name -> ENGINE=name +TYPE name -> TYPE=name +MAX_ROWS num -> MAX_ROWS=num +MIX_ROWS num -> MIX_ROWS=num +AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num +PASSWORD string -> PASSWORD=string +COMMENT string -> COMMENT=string +AUTO_INCREMENT num -> AUTO_INCREMENT=num +PACK_KEYS num/default -> PACK_KEYS=num/default +CHECKSUM num -> CHECKSUM=num +TABLE_CHECKSUM num -> TABLE_CHECKSUM=num +PAGE_CHECKSUM num -> PAGE_CHECKSUM=num +DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num +ROW_FORMAT name -> ROW_FORMAT=name +INSERT_METHOD name -> INSERT_METHOD=name +KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num +TRANSACTIONAL num -> TRANSACTIONAL=num + +Table options which will be left hardcoded +UNION +default charset +default collation +DATA DIRECTORY +TABLESPACE +STORAGE + For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. + + + + -=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300 +++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300 @@ -1 +1,12 @@ +Table definition ca looks like following +CREATE TABLE table + (field int ... field_opt1=fval1 field_opt2=fval2, + key key1(field) key_opt1=kval1 key_opt2=kval2) + table_option1=tval1, table_option2=tval2; + +Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +'=' was not obligatory. + +For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can +be separated from them by '=' sign. DESCRIPTION: Add ability to create table with additional option which can be passed to engine. Also make current options such as TRANSACTIONAL working via this mechanism. HIGH-LEVEL SPECIFICATION: Table definition ca looks like following CREATE TABLE table (field int ... field_opt1=fval1 field_opt2=fval2, key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; Exclusion should be made for old table and key options where '=' was not obligatory. Old key options: KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num WITH PARSER <name> -> PARSER=name Old table options: ENGINE name -> ENGINE=name TYPE name -> TYPE=name MAX_ROWS num -> MAX_ROWS=num MIX_ROWS num -> MIX_ROWS=num AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num PASSWORD string -> PASSWORD=string COMMENT string -> COMMENT=string AUTO_INCREMENT num -> AUTO_INCREMENT=num PACK_KEYS num/default -> PACK_KEYS=num/default CHECKSUM num -> CHECKSUM=num TABLE_CHECKSUM num -> TABLE_CHECKSUM=num PAGE_CHECKSUM num -> PAGE_CHECKSUM=num DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num ROW_FORMAT name -> ROW_FORMAT=name INSERT_METHOD name -> INSERT_METHOD=name KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num TRANSACTIONAL num -> TRANSACTIONAL=num Table options which will be left hardcoded UNION default charset default collation DATA DIRECTORY TABLESPACE STORAGE For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply＠askmonty.org 11 Aug '09

11 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: options for CREATE TABLE CREATION DATE..: Tue, 11 Aug 2009, 17:02 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: Sanja COPIES TO......: Monty CATEGORY.......: Server-BackLog TASK ID........: 43 (http://askmonty.org/worklog/?tid=43) VERSION........: Server-5.1 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 32 (hours remain) ORIG. ESTIMATE.: 32 PROGRESS NOTES: -=-=(Guest - Tue, 11 Aug 2009, 19:57)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.31856 2009-08-11 19:57:38.000000000 +0300 +++ /tmp/wklog.43.new.31856 2009-08-11 19:57:38.000000000 +0300 @@ -5,8 +5,43 @@ key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; -Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +Exclusion should be made for old table and key options where '=' was not obligatory. +Old key options: +KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num +WITH PARSER <name> -> PARSER=name + +Old table options: +ENGINE name -> ENGINE=name +TYPE name -> TYPE=name +MAX_ROWS num -> MAX_ROWS=num +MIX_ROWS num -> MIX_ROWS=num +AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num +PASSWORD string -> PASSWORD=string +COMMENT string -> COMMENT=string +AUTO_INCREMENT num -> AUTO_INCREMENT=num +PACK_KEYS num/default -> PACK_KEYS=num/default +CHECKSUM num -> CHECKSUM=num +TABLE_CHECKSUM num -> TABLE_CHECKSUM=num +PAGE_CHECKSUM num -> PAGE_CHECKSUM=num +DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num +ROW_FORMAT name -> ROW_FORMAT=name +INSERT_METHOD name -> INSERT_METHOD=name +KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num +TRANSACTIONAL num -> TRANSACTIONAL=num + +Table options which will be left hardcoded +UNION +default charset +default collation +DATA DIRECTORY +TABLESPACE +STORAGE + For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. + + + + -=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300 +++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300 @@ -1 +1,12 @@ +Table definition ca looks like following +CREATE TABLE table + (field int ... field_opt1=fval1 field_opt2=fval2, + key key1(field) key_opt1=kval1 key_opt2=kval2) + table_option1=tval1, table_option2=tval2; + +Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +'=' was not obligatory. + +For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can +be separated from them by '=' sign. DESCRIPTION: Add ability to create table with additional option which can be passed to engine. Also make current options such as TRANSACTIONAL working via this mechanism. HIGH-LEVEL SPECIFICATION: Table definition ca looks like following CREATE TABLE table (field int ... field_opt1=fval1 field_opt2=fval2, key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; Exclusion should be made for old table and key options where '=' was not obligatory. Old key options: KEY_BLOCK_SIZE <num> -> KEY_BLOCK_SIZE=num WITH PARSER <name> -> PARSER=name Old table options: ENGINE name -> ENGINE=name TYPE name -> TYPE=name MAX_ROWS num -> MAX_ROWS=num MIX_ROWS num -> MIX_ROWS=num AVG_ROW_LENGTH num -> AVG_ROW_LENGTH=num PASSWORD string -> PASSWORD=string COMMENT string -> COMMENT=string AUTO_INCREMENT num -> AUTO_INCREMENT=num PACK_KEYS num/default -> PACK_KEYS=num/default CHECKSUM num -> CHECKSUM=num TABLE_CHECKSUM num -> TABLE_CHECKSUM=num PAGE_CHECKSUM num -> PAGE_CHECKSUM=num DELAY_KEY_WRITE num -> DELAY_KEY_WRITE=num ROW_FORMAT name -> ROW_FORMAT=name INSERT_METHOD name -> INSERT_METHOD=name KEY_BLOCK_SIZE num -> KEY_BLOCK_SIZE=num TRANSACTIONAL num -> TRANSACTIONAL=num Table options which will be left hardcoded UNION default charset default collation DATA DIRECTORY TABLESPACE STORAGE For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply＠askmonty.org 11 Aug '09

11 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: options for CREATE TABLE CREATION DATE..: Tue, 11 Aug 2009, 17:02 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: Sanja COPIES TO......: Monty CATEGORY.......: Server-BackLog TASK ID........: 43 (http://askmonty.org/worklog/?tid=43) VERSION........: Server-5.1 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 32 (hours remain) ORIG. ESTIMATE.: 32 PROGRESS NOTES: -=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300 +++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300 @@ -1 +1,12 @@ +Table definition ca looks like following +CREATE TABLE table + (field int ... field_opt1=fval1 field_opt2=fval2, + key key1(field) key_opt1=kval1 key_opt2=kval2) + table_option1=tval1, table_option2=tval2; + +Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +'=' was not obligatory. + +For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can +be separated from them by '=' sign. DESCRIPTION: Add ability to create table with additional option which can be passed to engine. Also make current options such as TRANSACTIONAL working via this mechanism. HIGH-LEVEL SPECIFICATION: Table definition ca looks like following CREATE TABLE table (field int ... field_opt1=fval1 field_opt2=fval2, key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where '=' was not obligatory. For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply＠askmonty.org 11 Aug '09

11 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: options for CREATE TABLE CREATION DATE..: Tue, 11 Aug 2009, 17:02 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: Sanja COPIES TO......: Monty CATEGORY.......: Server-BackLog TASK ID........: 43 (http://askmonty.org/worklog/?tid=43) VERSION........: Server-5.1 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 32 (hours remain) ORIG. ESTIMATE.: 32 PROGRESS NOTES: -=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300 +++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300 @@ -1 +1,12 @@ +Table definition ca looks like following +CREATE TABLE table + (field int ... field_opt1=fval1 field_opt2=fval2, + key key1(field) key_opt1=kval1 key_opt2=kval2) + table_option1=tval1, table_option2=tval2; + +Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +'=' was not obligatory. + +For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can +be separated from them by '=' sign. DESCRIPTION: Add ability to create table with additional option which can be passed to engine. Also make current options such as TRANSACTIONAL working via this mechanism. HIGH-LEVEL SPECIFICATION: Table definition ca looks like following CREATE TABLE table (field int ... field_opt1=fval1 field_opt2=fval2, key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where '=' was not obligatory. For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Guest): options for CREATE TABLE (43)
by worklog-noreply＠askmonty.org 11 Aug '09

11 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: options for CREATE TABLE CREATION DATE..: Tue, 11 Aug 2009, 17:02 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: Sanja COPIES TO......: Monty CATEGORY.......: Server-BackLog TASK ID........: 43 (http://askmonty.org/worklog/?tid=43) VERSION........: Server-5.1 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 32 (hours remain) ORIG. ESTIMATE.: 32 PROGRESS NOTES: -=-=(Guest - Tue, 11 Aug 2009, 19:36)=-=- High-Level Specification modified. --- /tmp/wklog.43.old.30883 2009-08-11 19:36:45.000000000 +0300 +++ /tmp/wklog.43.new.30883 2009-08-11 19:36:45.000000000 +0300 @@ -1 +1,12 @@ +Table definition ca looks like following +CREATE TABLE table + (field int ... field_opt1=fval1 field_opt2=fval2, + key key1(field) key_opt1=kval1 key_opt2=kval2) + table_option1=tval1, table_option2=tval2; + +Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where +'=' was not obligatory. + +For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can +be separated from them by '=' sign. DESCRIPTION: Add ability to create table with additional option which can be passed to engine. Also make current options such as TRANSACTIONAL working via this mechanism. HIGH-LEVEL SPECIFICATION: Table definition ca looks like following CREATE TABLE table (field int ... field_opt1=fval1 field_opt2=fval2, key key1(field) key_opt1=kval1 key_opt2=kval2) table_option1=tval1, table_option2=tval2; Exclusion should be made for old table and key (KEY_BLOCK_SIZE) options where '=' was not obligatory. For fields options can go with field attributes (NOT NULL, UNIQUE and so on) can be separated from them by '=' sign. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

Re: [Maria-developers] Federated code - IRC message
by Michael Widenius 11 Aug '09

11 Aug '09

Hi! I am copying to maria-developers@ to ensure that everyone has a change to answer... >>>>> "Patrick" == Patrick Galbraith <patg(a)patg.net> writes: Patrick> Monty, Patrick> I saw your message in IRC - I replied in case you don't see it . I want Patrick> to get this into the tree soon and am only having a small problem right now: Patrick> [08:06] <CaptTofu> montywi: I am striving to Patrick> [08:08] <CaptTofu> montywi: I just have one issue to solve - if the Patrick> engine is build as a plugin, how I can get the test to run. right now, Patrick> when it runs, it doesn't find the engine loaded, so it skips the test. I Patrick> tried to add a 'load plugin' to the test, but it can find the shared Patrick> library because it expects it to be in "(errno: 2 Patrick> dlopen(/Users/patg/code_devel/federated/lib/mysql/plugin/ha_federatedx.so, Patrick> 2): image not found)" We should probably try to fix that for the test suite. Kristian, do you have any ideas for this ? Patrick> So, I'm wondering if to test properly, one needs to compile the engine Patrick> into the server versus as a plugin? Yes, that is what you need to do (as far as I know). Regards, Monty

4 4

[Maria-developers] New (by Sanja): options for CREATE TABLE (43)
by worklog-noreply＠askmonty.org 11 Aug '09

11 Aug '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: options for CREATE TABLE CREATION DATE..: Tue, 11 Aug 2009, 17:02 SUPERVISOR.....: Bothorsen IMPLEMENTOR....: Sanja COPIES TO......: Monty CATEGORY.......: Server-BackLog TASK ID........: 43 (http://askmonty.org/worklog/?tid=43) VERSION........: Server-5.1 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 32 (hours remain) ORIG. ESTIMATE.: 32 PROGRESS NOTES: DESCRIPTION: Add ability to create table with additional option which can be passed to engine. Also make current options such as TRANSACTIONAL working via this mechanism. ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0