- developers - lists.mariadb.org

[Maria-developers] Updated (by Hakan): Add info to engine description (61)
by worklog-noreply＠askmonty.org 05 Nov '09

05 Nov '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add info to engine description CREATION DATE..: Mon, 02 Nov 2009, 22:58 SUPERVISOR.....: Monty IMPLEMENTOR....: Sanja COPIES TO......: CATEGORY.......: Server-BackLog TASK ID........: 61 (http://askmonty.org/worklog/?tid=61) VERSION........: Server-5.1 STATUS.........: Assigned PRIORITY.......: 60 WORKED HOURS...: 0 ESTIMATE.......: 5 (hours remain) ORIG. ESTIMATE.: 5 PROGRESS NOTES: -=-=(Hakan - Thu, 05 Nov 2009, 21:09)=-=- High Level Description modified. --- /tmp/wklog.61.old.22402 2009-11-05 21:09:08.000000000 +0200 +++ /tmp/wklog.61.new.22402 2009-11-05 21:09:08.000000000 +0200 @@ -1,8 +1,8 @@ Add additional information about engine and show it in SHOW ENGINES: -License (PROPRIETARY, GPL, BSD) (it is present just have to be shown) +License (PROPRIETARY, GPL, BSD) (it is already present, just have to be shown) Maturity (TEST, ALPHA, BETA, GAMMA, RELEASE) -Version (just string from engine developer like "0.99 betta", better if it will +Version (just string from engine developer like "0.99 beta", better if it will be monotonically increasing in terms of alphabetical sort). DESCRIPTION: Add additional information about engine and show it in SHOW ENGINES: License (PROPRIETARY, GPL, BSD) (it is already present, just have to be shown) Maturity (TEST, ALPHA, BETA, GAMMA, RELEASE) Version (just string from engine developer like "0.99 beta", better if it will be monotonically increasing in terms of alphabetical sort). ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2796)
by knielsen＠knielsen-hq.org 05 Nov '09

05 Nov '09

#At lp:maria 2796 knielsen(a)knielsen-hq.org 2009-11-05 Fix running test suite from installed package. Some files are in different locations in this case. modified: mysql-test/lib/mtr_cases.pm mysql-test/mysql-test-run.pl === modified file 'mysql-test/lib/mtr_cases.pm' --- a/mysql-test/lib/mtr_cases.pm 2009-10-07 22:59:07 +0000 +++ b/mysql-test/lib/mtr_cases.pm 2009-11-05 13:40:01 +0000 @@ -249,6 +249,10 @@ sub collect_one_suite($) $suitedir= my_find_dir($::basedir, ["mysql-test/suite", "mysql-test", + "share/mysql-test/suite", + "share/mysql-test", + "share/mysql/mysql-test/suite", + "share/mysql/mysql-test", # Look in storage engine specific suite dirs "storage/*/mysql-test-suites" ], === modified file 'mysql-test/mysql-test-run.pl' --- a/mysql-test/mysql-test-run.pl 2009-10-07 22:57:43 +0000 +++ b/mysql-test/mysql-test-run.pl 2009-11-05 13:40:01 +0000 @@ -1001,6 +1001,12 @@ sub command_line_setup { { $basedir= dirname($basedir); } + # For .deb, it's like RPM, but installed in /usr/share/mysql/mysql-test. + # So move up one more directory level yet. + if ( ! $source_dist and ! -d "$basedir/bin" ) + { + $basedir= dirname($basedir); + } # Look for the client binaries directory if ($path_client_bindir)

1 0

[Maria-developers] bzr commit into MariaDB 5.1, with Maria 1.5:maria branch (knielsen:2796)
by knielsen＠knielsen-hq.org 05 Nov '09

05 Nov '09

#At lp:maria 2796 knielsen(a)knielsen-hq.org 2009-11-05 MBug#471160: Fix testsuite to work when run from installed packages. Some files are in different locations in this case. modified: mysql-test/lib/mtr_cases.pm mysql-test/mysql-test-run.pl === modified file 'mysql-test/lib/mtr_cases.pm' --- a/mysql-test/lib/mtr_cases.pm 2009-10-07 22:59:07 +0000 +++ b/mysql-test/lib/mtr_cases.pm 2009-11-05 13:37:47 +0000 @@ -249,6 +249,10 @@ sub collect_one_suite($) $suitedir= my_find_dir($::basedir, ["mysql-test/suite", "mysql-test", + "share/mysql-test/suite", + "share/mysql-test", + "share/mysql/mysql-test/suite", + "share/mysql/mysql-test", # Look in storage engine specific suite dirs "storage/*/mysql-test-suites" ], === modified file 'mysql-test/mysql-test-run.pl' --- a/mysql-test/mysql-test-run.pl 2009-10-07 22:57:43 +0000 +++ b/mysql-test/mysql-test-run.pl 2009-11-05 13:37:47 +0000 @@ -1001,6 +1001,12 @@ sub command_line_setup { { $basedir= dirname($basedir); } + # For .deb, it's like RPM, but installed in /usr/share/mysql/mysql-test. + # So move up one more directory level yet. + if ( ! $source_dist and ! -d "$basedir/bin" ) + { + $basedir= dirname($basedir); + } # Look for the client binaries directory if ($path_client_bindir)

1 0

[Maria-developers] Updated (by Alexi): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply＠askmonty.org 05 Nov '09

05 Nov '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter updates to certain tables CREATION DATE..: Mon, 10 Aug 2009, 13:25 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 40 (http://askmonty.org/worklog/?tid=40) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 2 ESTIMATE.......: 48 (hours remain) ORIG. ESTIMATE.: 48 PROGRESS NOTES: -=-=(Alexi - Thu, 05 Nov 2009, 12:37)=-=- Low Level Design modified. --- /tmp/wklog.40.old.11441 2009-11-05 12:37:16.000000000 +0200 +++ /tmp/wklog.40.new.11441 2009-11-05 12:37:16.000000000 +0200 @@ -1,9 +1,18 @@ OPTION: 2.5 Extend Query Events With Tables Info ================================================ -1. Query_log_event Binary Format -******************************** -Changes to be done: +1. Adding --binlog-with-tables-info option +****************************************** + +When set, Query events are to be written in the extended binary +format which contains tables_info. When not set, Query events +are to be written in usual format (without any changes). + +2. Query event extended binary format +************************************* + +When --binlog-with-tables-info is set, Query events are writen +to binary log in the following (extended) format. Query_log_event binary format --------------------------------- @@ -24,12 +33,12 @@ error_code 2 status_vars_len 2 + query_len 2 (see Note 1) -+ tables_info_len 2 (see Note 2) ++ tables_info_len 2 --------------------------------- BODY: status_vars status_vars_len - db db_len + 1 -+ db db_len (see Note 3) ++ db db_len (see Note 2) query query_len + tables_info @@ -37,7 +46,7 @@ --------------------------------- Name Size (bytes) --------------------------------- - db_len 1 (see Note 4) + db_len 1 (see Note 3) db db_len table_name_len 1 table_name table_name_len @@ -48,19 +57,99 @@ table_name table_name_len NOTES -1. Currently Query_log_event format doesn't include 'query_len' because +1. In usual format, Query_log_event doesn't include 'query_len' because it considers the query to extent to the end of the event. -2. If tables_info is not included in the event (--binlog-with-tables-info - option), tables_info_len = 0. -3. The trailing zero is redundant since the length is already known. -4. In case of db = current db, db_len = 0 and db = empty, because - current db is already included in the current event format. +2. For 'db' (current db) the trailing zero is redundant since the length + is already known. +3. db_len = 0 means that this is the current db. + +When reading Query events from binary log, we can recognize its format +by its post-header length: in extended case the post-header includes 4 +additional bytes. + + #define QUERY_HEADER_LEN (QUERY_HEADER_MINIMAL_LEN + 4) ++ #define QUERY_HEADER_LEN_EXT (QUERY_HEADER_LEN + 4) + ... + #define Q_STATUS_VARS_LEN_OFFSET 11 ++ #define Q_QUERY_LEN_OFFSET Q_STATUS_VARS_LEN_OFFSET + 2 ++ #define Q_QUERY_TABLES_INFO_LEN_OFFSET Q_QUERY_LEN_OFFSET + 2 + +3. Changes in log events +************************ + +3.1. Format description event +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Changes needed here concern setting post-header length for Query events. +This setting is done in Format description event constructor which creates +the event for writing to binary log: + + if (binlog_with_tables_info) + post_header_len[QUERY_EVENT - 1] = QUERY_HEADER_LEN_EXT; + else + post_header_len[QUERY_EVENT - 1] = QUERY_HEADER_LEN; + +This change is to be done only for case binlog_ver = 4. + +NOTE. The refered above constructor is allowed to be invoked in a client +context for creating "artificial" Format description events in case of +MySQL < 5.0 (e.g. see mysqlbinlog code). To avoid compilation problems +(because of 'binlog_with_tables_info') and taking into account the +"MySQL < 5.0" restriction, we have to #ifdef out the above code in +following manner: + + switch (binlog_ver) { + case 4: /* MySQL 5.0 and higher */ ++ #ifndef MYSQL_CLIENT + ... ++ #else ++ <error> ++ #endif + case 1: + case 3: + ... + } + +3.2. Query event +~~~~~~~~~~~~~~~~ +Changes needed here include adding tables_info and tables_info_len +members (member for query length already exists) and modifying the +following function-members: + +Query_log_event(buf) constructor +-------------------------------- +[Parses binary format written to the 'buf'] + +Getting post-header length from the Format description event (passed +to the constructor as an argument), define whether buf contains an +extended or usual Query event and parse the buf contents accordingly. + +NOTE. Defining Query event format here should be done with taking into +account that this constructor can be called within a Query-derived +event with the event_type argument != QUERY_EVENT. + +Query_log_event(thd) constructor +-------------------------------- +[Creates the event for binlogging] + +In case of binlog_with_tables_info = TRUE, set additionally query_len, +tables_info_len, and tables_info members (the constructor is to have +an additional 'tables_info' argument). + +write() function +---------------- +[Writes the event to binlog] + +In case of binlog_with_tables_info = TRUE, write additional members +(query_len, tables_info_len, and tables_info) to binary log. Also +write corresponding whole event length to the common-header. + +<To be continued> -2. Where to get tables info from? +4. Where to get tables info from? ********************************* -2.1. Case study: CREATE TABLE -****************************** +4.1. Case study: CREATE TABLE +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *** CREATE TABLE table [SELECT ...] @@ -129,4 +218,4 @@ } } -To be continued +<To be continued> -=-=(Alexi - Wed, 04 Nov 2009, 10:21)=-=- Low Level Design modified. --- /tmp/wklog.40.old.6734 2009-11-04 10:21:20.000000000 +0200 +++ /tmp/wklog.40.new.6734 2009-11-04 10:21:20.000000000 +0200 @@ -21,9 +21,9 @@ slave_proxy_id 4 exec_time 4 db_len 1 -+ query_len 2 (see Note 1) error_code 2 status_vars_len 2 ++ query_len 2 (see Note 1) + tables_info_len 2 (see Note 2) --------------------------------- BODY: -=-=(Alexi - Tue, 03 Nov 2009, 11:19)=-=- Low Level Design modified. --- /tmp/wklog.40.old.7187 2009-11-03 11:19:22.000000000 +0200 +++ /tmp/wklog.40.new.7187 2009-11-03 11:19:22.000000000 +0200 @@ -1 +1,132 @@ +OPTION: 2.5 Extend Query Events With Tables Info +================================================ +1. Query_log_event Binary Format +******************************** +Changes to be done: + + Query_log_event binary format + --------------------------------- + Name Size (bytes) + --------------------------------- + COMMON HEADER: + timestamp 4 + type 1 + server_id 4 + total_size 4 + master_position 4 + flags 2 + --------------------------------- + POST HEADER: + slave_proxy_id 4 + exec_time 4 + db_len 1 ++ query_len 2 (see Note 1) + error_code 2 + status_vars_len 2 ++ tables_info_len 2 (see Note 2) + --------------------------------- + BODY: + status_vars status_vars_len +- db db_len + 1 ++ db db_len (see Note 3) + query query_len ++ tables_info + + tables_info binary format + --------------------------------- + Name Size (bytes) + --------------------------------- + db_len 1 (see Note 4) + db db_len + table_name_len 1 + table_name table_name_len + ... + db_len 1 + db db_len + table_name_len 1 + table_name table_name_len + +NOTES +1. Currently Query_log_event format doesn't include 'query_len' because + it considers the query to extent to the end of the event. +2. If tables_info is not included in the event (--binlog-with-tables-info + option), tables_info_len = 0. +3. The trailing zero is redundant since the length is already known. +4. In case of db = current db, db_len = 0 and db = empty, because + current db is already included in the current event format. + +2. Where to get tables info from? +********************************* + +2.1. Case study: CREATE TABLE +****************************** + +*** CREATE TABLE table [SELECT ...] + + bool mysql_create_table_no_lock( + THD *thd, + const char *db, + const char *table_name, ...) + { + ... + // ------------------------------------- + // WL40: To be included in tables_info: + // * db, table_name + // * thd->lex->query_tables (tables refered to in + // the select-part; empty if no select-part) + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + +*** CREATE TABLE table LIKE src-table + + bool mysql_create_like_table( + ... + TABLE_LIST *table, + TABLE_LIST *src_table, + ...) + { + ... + if (thd->current_stmt_binlog_row_based) + { // RBR: In this case we don't replicate temp tables + if (!(create_info->options & HA_LEX_CREATE_TMP_TABLE)) + { + if (src_table->table->s->tmp_table) + { // CREATE normal-table LIKE temp-table: + + // Generate new query without LIKE-part + store_create_info(thd, table, &query, create_info, FALSE); + + // ------------------------------------- + // WL40: To include to tables_info: + // * table (src_table is not included) + // ------------------------------------- + write_bin_log(thd, TRUE, query.ptr(), query.length()); + } + else + { // CREATE normal-table LIKE normal-table + + // ------------------------------------- + // WL40: To include to log_tables_info: + // * table + // * src_table + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + } + // CREATE temp-table LIKE ... + // This case is not replicated + } + else + { // SBR: + // ------------------------------------- + // WL40: To include to tables_info: + // * table + // * src_table + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + } + +To be continued -=-=(Alexi - Mon, 02 Nov 2009, 11:34)=-=- Worked 2 hours on option 2.5 Worked 2 hours and estimate 48 hours remain (original estimate increased by 50 hours). -=-=(Alexi - Mon, 02 Nov 2009, 11:20)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.4848 2009-11-02 11:20:24.000000000 +0200 +++ /tmp/wklog.40.new.4848 2009-11-02 11:20:24.000000000 +0200 @@ -90,3 +90,25 @@ It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. +2.5 Extend Query Events With Tables Info +---------------------------------------- + +We could extend query events structure with a tables info - a list of tables +which the query refers to: + + <current query event structure> + tables_info_len + dbase_len dbase + table_len table + ... + dbase_len dbase + table_len table + +Note. In case of <dbase> = current data base, we can set dbase_len = 0 + and dbase = empty because current query event structure already + includes current data base name. + +Note. Possibly it is reasonable also to add a --binlog-with-tables-info + option which defines whether tables info must be included to the + query events. + -=-=(Knielsen - Fri, 14 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.10896 2009-08-14 15:47:39.000000000 +0300 +++ /tmp/wklog.40.new.10896 2009-08-14 15:47:39.000000000 +0300 @@ -72,3 +72,21 @@ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. + +2.4 Implement server functionality to ignore certain tables +----------------------------------------------------------- + +We could add a general facility in the server to ignore certain tables: + + SET SESSION ignored_tables = "db1.t1,db2.t2"; + +This would work similar to --replicate-ignore-table, but in a general way not +restricted to the slave SQL thread. + +It would then be trivial for mysqlbinlog to add such statements at the start +of the output, or probably the user could just do it manually with no need for +additional options for mysqlbinlog. + +It might be useful to integrate this with the code that already handles +--replicate-ignore-db and similar slave options. + -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.12989 2009-08-10 15:41:23.000000000 +0300 +++ /tmp/wklog.40.new.12989 2009-08-10 15:41:23.000000000 +0300 @@ -1,6 +1,7 @@ - 1. Context ---------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl -=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=- Dependency created: 39 now depends on 40 -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High Level Description modified. --- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300 +++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300 @@ -1,3 +1,4 @@ Replication slave can be set to filter updates to certain tables with ---replicate-[wild-]{do,ignore}-table options. This task is about adding similar -functionality to mysqlbinlog. +--replicate-[wild-]{do,ignore}-table options. + +This task is about adding similar functionality to mysqlbinlog. -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300 +++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300 @@ -1 +1,73 @@ +1. Context +---------- +At the moment, the server has these replication slave options: + + --replicate-do-table=db.tbl + --replicate-ignore-table=db.tbl + --replicate-wild-do-table=pattern.pattern + --replicate-wild-ignore-table=pattern.pattern + +They affect both RBR and SBR events. SBR events are checked after the +statement has been parsed, the server iterates over list of used tables and +checks them againist --replicate instructions. + +What is interesting is that this scheme still allows to update the ignored +table through a VIEW. + +2. Table filtering in mysqlbinlog +--------------------------------- + +Per-table filtering of RBR events is easy (as it is relatively easy to extract +the name of the table that the event applies to). + +Per-table filtering of SBR events is hard, as generally it is not apparent +which tables the statement refers to. + +This opens possible options: + +2.1 Put the parser into mysqlbinlog +----------------------------------- +Once we have a full parser in mysqlbinlog, we'll be able to check which tables +are used by a statement, and will allow to show behaviour identical to those +that one obtains when using --replicate-* slave options. + +(It is not clear how much effort is needed to put the parser into mysqlbinlog. +Any guesses?) + + +2.2 Use dumb regexp match +------------------------- +Use a really dumb approach. A query is considered to be modifying table X if +it matches an expression + +CREATE TABLE $tablename +DROP $tablename +UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' +DELETE ...$tablename ... WHERE // same as above +ALTER TABLE $tablename +.. etc (go get from the grammar) .. + +The advantage over doing the same in awk is that mysqlbinlog will also process +RBR statements, and together with that will provide a working solution for +those who are careful with their table names not mixing with string constants +and such. + +(TODO: string constants are of particular concern as they come from +[potentially hostile] users, unlike e.g. table aliases which come from +[not hostile] developers. Remove also all string constants before attempting +to do match?) + +2.3 Have the master put annotations +----------------------------------- +We could add a master option so that it injects into query a mark that tells +which tables the query will affect, e.g. for the query + + UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... + + +the binlog will have + + /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... + +and further processing in mysqlbinlog will be trivial. DESCRIPTION: Replication slave can be set to filter updates to certain tables with --replicate-[wild-]{do,ignore}-table options. This task is about adding similar functionality to mysqlbinlog. HIGH-LEVEL SPECIFICATION: 1. Context ---------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl --replicate-ignore-table=db.tbl --replicate-wild-do-table=pattern.pattern --replicate-wild-ignore-table=pattern.pattern They affect both RBR and SBR events. SBR events are checked after the statement has been parsed, the server iterates over list of used tables and checks them againist --replicate instructions. What is interesting is that this scheme still allows to update the ignored table through a VIEW. 2. Table filtering in mysqlbinlog --------------------------------- Per-table filtering of RBR events is easy (as it is relatively easy to extract the name of the table that the event applies to). Per-table filtering of SBR events is hard, as generally it is not apparent which tables the statement refers to. This opens possible options: 2.1 Put the parser into mysqlbinlog ----------------------------------- Once we have a full parser in mysqlbinlog, we'll be able to check which tables are used by a statement, and will allow to show behaviour identical to those that one obtains when using --replicate-* slave options. (It is not clear how much effort is needed to put the parser into mysqlbinlog. Any guesses?) 2.2 Use dumb regexp match ------------------------- Use a really dumb approach. A query is considered to be modifying table X if it matches an expression CREATE TABLE $tablename DROP $tablename UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' DELETE ...$tablename ... WHERE // same as above ALTER TABLE $tablename .. etc (go get from the grammar) .. The advantage over doing the same in awk is that mysqlbinlog will also process RBR statements, and together with that will provide a working solution for those who are careful with their table names not mixing with string constants and such. (TODO: string constants are of particular concern as they come from [potentially hostile] users, unlike e.g. table aliases which come from [not hostile] developers. Remove also all string constants before attempting to do match?) 2.3 Have the master put annotations ----------------------------------- We could add a master option so that it injects into query a mark that tells which tables the query will affect, e.g. for the query UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... the binlog will have /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. 2.4 Implement server functionality to ignore certain tables ----------------------------------------------------------- We could add a general facility in the server to ignore certain tables: SET SESSION ignored_tables = "db1.t1,db2.t2"; This would work similar to --replicate-ignore-table, but in a general way not restricted to the slave SQL thread. It would then be trivial for mysqlbinlog to add such statements at the start of the output, or probably the user could just do it manually with no need for additional options for mysqlbinlog. It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. 2.5 Extend Query Events With Tables Info ---------------------------------------- We could extend query events structure with a tables info - a list of tables which the query refers to: <current query event structure> tables_info_len dbase_len dbase table_len table ... dbase_len dbase table_len table Note. In case of <dbase> = current data base, we can set dbase_len = 0 and dbase = empty because current query event structure already includes current data base name. Note. Possibly it is reasonable also to add a --binlog-with-tables-info option which defines whether tables info must be included to the query events. LOW-LEVEL DESIGN: OPTION: 2.5 Extend Query Events With Tables Info ================================================ 1. Adding --binlog-with-tables-info option ****************************************** When set, Query events are to be written in the extended binary format which contains tables_info. When not set, Query events are to be written in usual format (without any changes). 2. Query event extended binary format ************************************* When --binlog-with-tables-info is set, Query events are writen to binary log in the following (extended) format. Query_log_event binary format --------------------------------- Name Size (bytes) --------------------------------- COMMON HEADER: timestamp 4 type 1 server_id 4 total_size 4 master_position 4 flags 2 --------------------------------- POST HEADER: slave_proxy_id 4 exec_time 4 db_len 1 error_code 2 status_vars_len 2 + query_len 2 (see Note 1) + tables_info_len 2 --------------------------------- BODY: status_vars status_vars_len - db db_len + 1 + db db_len (see Note 2) query query_len + tables_info tables_info binary format --------------------------------- Name Size (bytes) --------------------------------- db_len 1 (see Note 3) db db_len table_name_len 1 table_name table_name_len ... db_len 1 db db_len table_name_len 1 table_name table_name_len NOTES 1. In usual format, Query_log_event doesn't include 'query_len' because it considers the query to extent to the end of the event. 2. For 'db' (current db) the trailing zero is redundant since the length is already known. 3. db_len = 0 means that this is the current db. When reading Query events from binary log, we can recognize its format by its post-header length: in extended case the post-header includes 4 additional bytes. #define QUERY_HEADER_LEN (QUERY_HEADER_MINIMAL_LEN + 4) + #define QUERY_HEADER_LEN_EXT (QUERY_HEADER_LEN + 4) ... #define Q_STATUS_VARS_LEN_OFFSET 11 + #define Q_QUERY_LEN_OFFSET Q_STATUS_VARS_LEN_OFFSET + 2 + #define Q_QUERY_TABLES_INFO_LEN_OFFSET Q_QUERY_LEN_OFFSET + 2 3. Changes in log events ************************ 3.1. Format description event ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Changes needed here concern setting post-header length for Query events. This setting is done in Format description event constructor which creates the event for writing to binary log: if (binlog_with_tables_info) post_header_len[QUERY_EVENT - 1] = QUERY_HEADER_LEN_EXT; else post_header_len[QUERY_EVENT - 1] = QUERY_HEADER_LEN; This change is to be done only for case binlog_ver = 4. NOTE. The refered above constructor is allowed to be invoked in a client context for creating "artificial" Format description events in case of MySQL < 5.0 (e.g. see mysqlbinlog code). To avoid compilation problems (because of 'binlog_with_tables_info') and taking into account the "MySQL < 5.0" restriction, we have to #ifdef out the above code in following manner: switch (binlog_ver) { case 4: /* MySQL 5.0 and higher */ + #ifndef MYSQL_CLIENT ... + #else + <error> + #endif case 1: case 3: ... } 3.2. Query event ~~~~~~~~~~~~~~~~ Changes needed here include adding tables_info and tables_info_len members (member for query length already exists) and modifying the following function-members: Query_log_event(buf) constructor -------------------------------- [Parses binary format written to the 'buf'] Getting post-header length from the Format description event (passed to the constructor as an argument), define whether buf contains an extended or usual Query event and parse the buf contents accordingly. NOTE. Defining Query event format here should be done with taking into account that this constructor can be called within a Query-derived event with the event_type argument != QUERY_EVENT. Query_log_event(thd) constructor -------------------------------- [Creates the event for binlogging] In case of binlog_with_tables_info = TRUE, set additionally query_len, tables_info_len, and tables_info members (the constructor is to have an additional 'tables_info' argument). write() function ---------------- [Writes the event to binlog] In case of binlog_with_tables_info = TRUE, write additional members (query_len, tables_info_len, and tables_info) to binary log. Also write corresponding whole event length to the common-header. <To be continued> 4. Where to get tables info from? ********************************* 4.1. Case study: CREATE TABLE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *** CREATE TABLE table [SELECT ...] bool mysql_create_table_no_lock( THD *thd, const char *db, const char *table_name, ...) { ... // ------------------------------------- // WL40: To be included in tables_info: // * db, table_name // * thd->lex->query_tables (tables refered to in // the select-part; empty if no select-part) // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } *** CREATE TABLE table LIKE src-table bool mysql_create_like_table( ... TABLE_LIST *table, TABLE_LIST *src_table, ...) { ... if (thd->current_stmt_binlog_row_based) { // RBR: In this case we don't replicate temp tables if (!(create_info->options & HA_LEX_CREATE_TMP_TABLE)) { if (src_table->table->s->tmp_table) { // CREATE normal-table LIKE temp-table: // Generate new query without LIKE-part store_create_info(thd, table, &query, create_info, FALSE); // ------------------------------------- // WL40: To include to tables_info: // * table (src_table is not included) // ------------------------------------- write_bin_log(thd, TRUE, query.ptr(), query.length()); } else { // CREATE normal-table LIKE normal-table // ------------------------------------- // WL40: To include to log_tables_info: // * table // * src_table // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } } // CREATE temp-table LIKE ... // This case is not replicated } else { // SBR: // ------------------------------------- // WL40: To include to tables_info: // * table // * src_table // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } } <To be continued> ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Alexi): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply＠askmonty.org 05 Nov '09

05 Nov '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter updates to certain tables CREATION DATE..: Mon, 10 Aug 2009, 13:25 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 40 (http://askmonty.org/worklog/?tid=40) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 2 ESTIMATE.......: 48 (hours remain) ORIG. ESTIMATE.: 48 PROGRESS NOTES: -=-=(Alexi - Thu, 05 Nov 2009, 12:37)=-=- Low Level Design modified. --- /tmp/wklog.40.old.11441 2009-11-05 12:37:16.000000000 +0200 +++ /tmp/wklog.40.new.11441 2009-11-05 12:37:16.000000000 +0200 @@ -1,9 +1,18 @@ OPTION: 2.5 Extend Query Events With Tables Info ================================================ -1. Query_log_event Binary Format -******************************** -Changes to be done: +1. Adding --binlog-with-tables-info option +****************************************** + +When set, Query events are to be written in the extended binary +format which contains tables_info. When not set, Query events +are to be written in usual format (without any changes). + +2. Query event extended binary format +************************************* + +When --binlog-with-tables-info is set, Query events are writen +to binary log in the following (extended) format. Query_log_event binary format --------------------------------- @@ -24,12 +33,12 @@ error_code 2 status_vars_len 2 + query_len 2 (see Note 1) -+ tables_info_len 2 (see Note 2) ++ tables_info_len 2 --------------------------------- BODY: status_vars status_vars_len - db db_len + 1 -+ db db_len (see Note 3) ++ db db_len (see Note 2) query query_len + tables_info @@ -37,7 +46,7 @@ --------------------------------- Name Size (bytes) --------------------------------- - db_len 1 (see Note 4) + db_len 1 (see Note 3) db db_len table_name_len 1 table_name table_name_len @@ -48,19 +57,99 @@ table_name table_name_len NOTES -1. Currently Query_log_event format doesn't include 'query_len' because +1. In usual format, Query_log_event doesn't include 'query_len' because it considers the query to extent to the end of the event. -2. If tables_info is not included in the event (--binlog-with-tables-info - option), tables_info_len = 0. -3. The trailing zero is redundant since the length is already known. -4. In case of db = current db, db_len = 0 and db = empty, because - current db is already included in the current event format. +2. For 'db' (current db) the trailing zero is redundant since the length + is already known. +3. db_len = 0 means that this is the current db. + +When reading Query events from binary log, we can recognize its format +by its post-header length: in extended case the post-header includes 4 +additional bytes. + + #define QUERY_HEADER_LEN (QUERY_HEADER_MINIMAL_LEN + 4) ++ #define QUERY_HEADER_LEN_EXT (QUERY_HEADER_LEN + 4) + ... + #define Q_STATUS_VARS_LEN_OFFSET 11 ++ #define Q_QUERY_LEN_OFFSET Q_STATUS_VARS_LEN_OFFSET + 2 ++ #define Q_QUERY_TABLES_INFO_LEN_OFFSET Q_QUERY_LEN_OFFSET + 2 + +3. Changes in log events +************************ + +3.1. Format description event +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Changes needed here concern setting post-header length for Query events. +This setting is done in Format description event constructor which creates +the event for writing to binary log: + + if (binlog_with_tables_info) + post_header_len[QUERY_EVENT - 1] = QUERY_HEADER_LEN_EXT; + else + post_header_len[QUERY_EVENT - 1] = QUERY_HEADER_LEN; + +This change is to be done only for case binlog_ver = 4. + +NOTE. The refered above constructor is allowed to be invoked in a client +context for creating "artificial" Format description events in case of +MySQL < 5.0 (e.g. see mysqlbinlog code). To avoid compilation problems +(because of 'binlog_with_tables_info') and taking into account the +"MySQL < 5.0" restriction, we have to #ifdef out the above code in +following manner: + + switch (binlog_ver) { + case 4: /* MySQL 5.0 and higher */ ++ #ifndef MYSQL_CLIENT + ... ++ #else ++ <error> ++ #endif + case 1: + case 3: + ... + } + +3.2. Query event +~~~~~~~~~~~~~~~~ +Changes needed here include adding tables_info and tables_info_len +members (member for query length already exists) and modifying the +following function-members: + +Query_log_event(buf) constructor +-------------------------------- +[Parses binary format written to the 'buf'] + +Getting post-header length from the Format description event (passed +to the constructor as an argument), define whether buf contains an +extended or usual Query event and parse the buf contents accordingly. + +NOTE. Defining Query event format here should be done with taking into +account that this constructor can be called within a Query-derived +event with the event_type argument != QUERY_EVENT. + +Query_log_event(thd) constructor +-------------------------------- +[Creates the event for binlogging] + +In case of binlog_with_tables_info = TRUE, set additionally query_len, +tables_info_len, and tables_info members (the constructor is to have +an additional 'tables_info' argument). + +write() function +---------------- +[Writes the event to binlog] + +In case of binlog_with_tables_info = TRUE, write additional members +(query_len, tables_info_len, and tables_info) to binary log. Also +write corresponding whole event length to the common-header. + +<To be continued> -2. Where to get tables info from? +4. Where to get tables info from? ********************************* -2.1. Case study: CREATE TABLE -****************************** +4.1. Case study: CREATE TABLE +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *** CREATE TABLE table [SELECT ...] @@ -129,4 +218,4 @@ } } -To be continued +<To be continued> -=-=(Alexi - Wed, 04 Nov 2009, 10:21)=-=- Low Level Design modified. --- /tmp/wklog.40.old.6734 2009-11-04 10:21:20.000000000 +0200 +++ /tmp/wklog.40.new.6734 2009-11-04 10:21:20.000000000 +0200 @@ -21,9 +21,9 @@ slave_proxy_id 4 exec_time 4 db_len 1 -+ query_len 2 (see Note 1) error_code 2 status_vars_len 2 ++ query_len 2 (see Note 1) + tables_info_len 2 (see Note 2) --------------------------------- BODY: -=-=(Alexi - Tue, 03 Nov 2009, 11:19)=-=- Low Level Design modified. --- /tmp/wklog.40.old.7187 2009-11-03 11:19:22.000000000 +0200 +++ /tmp/wklog.40.new.7187 2009-11-03 11:19:22.000000000 +0200 @@ -1 +1,132 @@ +OPTION: 2.5 Extend Query Events With Tables Info +================================================ +1. Query_log_event Binary Format +******************************** +Changes to be done: + + Query_log_event binary format + --------------------------------- + Name Size (bytes) + --------------------------------- + COMMON HEADER: + timestamp 4 + type 1 + server_id 4 + total_size 4 + master_position 4 + flags 2 + --------------------------------- + POST HEADER: + slave_proxy_id 4 + exec_time 4 + db_len 1 ++ query_len 2 (see Note 1) + error_code 2 + status_vars_len 2 ++ tables_info_len 2 (see Note 2) + --------------------------------- + BODY: + status_vars status_vars_len +- db db_len + 1 ++ db db_len (see Note 3) + query query_len ++ tables_info + + tables_info binary format + --------------------------------- + Name Size (bytes) + --------------------------------- + db_len 1 (see Note 4) + db db_len + table_name_len 1 + table_name table_name_len + ... + db_len 1 + db db_len + table_name_len 1 + table_name table_name_len + +NOTES +1. Currently Query_log_event format doesn't include 'query_len' because + it considers the query to extent to the end of the event. +2. If tables_info is not included in the event (--binlog-with-tables-info + option), tables_info_len = 0. +3. The trailing zero is redundant since the length is already known. +4. In case of db = current db, db_len = 0 and db = empty, because + current db is already included in the current event format. + +2. Where to get tables info from? +********************************* + +2.1. Case study: CREATE TABLE +****************************** + +*** CREATE TABLE table [SELECT ...] + + bool mysql_create_table_no_lock( + THD *thd, + const char *db, + const char *table_name, ...) + { + ... + // ------------------------------------- + // WL40: To be included in tables_info: + // * db, table_name + // * thd->lex->query_tables (tables refered to in + // the select-part; empty if no select-part) + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + +*** CREATE TABLE table LIKE src-table + + bool mysql_create_like_table( + ... + TABLE_LIST *table, + TABLE_LIST *src_table, + ...) + { + ... + if (thd->current_stmt_binlog_row_based) + { // RBR: In this case we don't replicate temp tables + if (!(create_info->options & HA_LEX_CREATE_TMP_TABLE)) + { + if (src_table->table->s->tmp_table) + { // CREATE normal-table LIKE temp-table: + + // Generate new query without LIKE-part + store_create_info(thd, table, &query, create_info, FALSE); + + // ------------------------------------- + // WL40: To include to tables_info: + // * table (src_table is not included) + // ------------------------------------- + write_bin_log(thd, TRUE, query.ptr(), query.length()); + } + else + { // CREATE normal-table LIKE normal-table + + // ------------------------------------- + // WL40: To include to log_tables_info: + // * table + // * src_table + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + } + // CREATE temp-table LIKE ... + // This case is not replicated + } + else + { // SBR: + // ------------------------------------- + // WL40: To include to tables_info: + // * table + // * src_table + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + } + +To be continued -=-=(Alexi - Mon, 02 Nov 2009, 11:34)=-=- Worked 2 hours on option 2.5 Worked 2 hours and estimate 48 hours remain (original estimate increased by 50 hours). -=-=(Alexi - Mon, 02 Nov 2009, 11:20)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.4848 2009-11-02 11:20:24.000000000 +0200 +++ /tmp/wklog.40.new.4848 2009-11-02 11:20:24.000000000 +0200 @@ -90,3 +90,25 @@ It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. +2.5 Extend Query Events With Tables Info +---------------------------------------- + +We could extend query events structure with a tables info - a list of tables +which the query refers to: + + <current query event structure> + tables_info_len + dbase_len dbase + table_len table + ... + dbase_len dbase + table_len table + +Note. In case of <dbase> = current data base, we can set dbase_len = 0 + and dbase = empty because current query event structure already + includes current data base name. + +Note. Possibly it is reasonable also to add a --binlog-with-tables-info + option which defines whether tables info must be included to the + query events. + -=-=(Knielsen - Fri, 14 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.10896 2009-08-14 15:47:39.000000000 +0300 +++ /tmp/wklog.40.new.10896 2009-08-14 15:47:39.000000000 +0300 @@ -72,3 +72,21 @@ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. + +2.4 Implement server functionality to ignore certain tables +----------------------------------------------------------- + +We could add a general facility in the server to ignore certain tables: + + SET SESSION ignored_tables = "db1.t1,db2.t2"; + +This would work similar to --replicate-ignore-table, but in a general way not +restricted to the slave SQL thread. + +It would then be trivial for mysqlbinlog to add such statements at the start +of the output, or probably the user could just do it manually with no need for +additional options for mysqlbinlog. + +It might be useful to integrate this with the code that already handles +--replicate-ignore-db and similar slave options. + -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.12989 2009-08-10 15:41:23.000000000 +0300 +++ /tmp/wklog.40.new.12989 2009-08-10 15:41:23.000000000 +0300 @@ -1,6 +1,7 @@ - 1. Context ---------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl -=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=- Dependency created: 39 now depends on 40 -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High Level Description modified. --- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300 +++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300 @@ -1,3 +1,4 @@ Replication slave can be set to filter updates to certain tables with ---replicate-[wild-]{do,ignore}-table options. This task is about adding similar -functionality to mysqlbinlog. +--replicate-[wild-]{do,ignore}-table options. + +This task is about adding similar functionality to mysqlbinlog. -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300 +++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300 @@ -1 +1,73 @@ +1. Context +---------- +At the moment, the server has these replication slave options: + + --replicate-do-table=db.tbl + --replicate-ignore-table=db.tbl + --replicate-wild-do-table=pattern.pattern + --replicate-wild-ignore-table=pattern.pattern + +They affect both RBR and SBR events. SBR events are checked after the +statement has been parsed, the server iterates over list of used tables and +checks them againist --replicate instructions. + +What is interesting is that this scheme still allows to update the ignored +table through a VIEW. + +2. Table filtering in mysqlbinlog +--------------------------------- + +Per-table filtering of RBR events is easy (as it is relatively easy to extract +the name of the table that the event applies to). + +Per-table filtering of SBR events is hard, as generally it is not apparent +which tables the statement refers to. + +This opens possible options: + +2.1 Put the parser into mysqlbinlog +----------------------------------- +Once we have a full parser in mysqlbinlog, we'll be able to check which tables +are used by a statement, and will allow to show behaviour identical to those +that one obtains when using --replicate-* slave options. + +(It is not clear how much effort is needed to put the parser into mysqlbinlog. +Any guesses?) + + +2.2 Use dumb regexp match +------------------------- +Use a really dumb approach. A query is considered to be modifying table X if +it matches an expression + +CREATE TABLE $tablename +DROP $tablename +UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' +DELETE ...$tablename ... WHERE // same as above +ALTER TABLE $tablename +.. etc (go get from the grammar) .. + +The advantage over doing the same in awk is that mysqlbinlog will also process +RBR statements, and together with that will provide a working solution for +those who are careful with their table names not mixing with string constants +and such. + +(TODO: string constants are of particular concern as they come from +[potentially hostile] users, unlike e.g. table aliases which come from +[not hostile] developers. Remove also all string constants before attempting +to do match?) + +2.3 Have the master put annotations +----------------------------------- +We could add a master option so that it injects into query a mark that tells +which tables the query will affect, e.g. for the query + + UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... + + +the binlog will have + + /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... + +and further processing in mysqlbinlog will be trivial. DESCRIPTION: Replication slave can be set to filter updates to certain tables with --replicate-[wild-]{do,ignore}-table options. This task is about adding similar functionality to mysqlbinlog. HIGH-LEVEL SPECIFICATION: 1. Context ---------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl --replicate-ignore-table=db.tbl --replicate-wild-do-table=pattern.pattern --replicate-wild-ignore-table=pattern.pattern They affect both RBR and SBR events. SBR events are checked after the statement has been parsed, the server iterates over list of used tables and checks them againist --replicate instructions. What is interesting is that this scheme still allows to update the ignored table through a VIEW. 2. Table filtering in mysqlbinlog --------------------------------- Per-table filtering of RBR events is easy (as it is relatively easy to extract the name of the table that the event applies to). Per-table filtering of SBR events is hard, as generally it is not apparent which tables the statement refers to. This opens possible options: 2.1 Put the parser into mysqlbinlog ----------------------------------- Once we have a full parser in mysqlbinlog, we'll be able to check which tables are used by a statement, and will allow to show behaviour identical to those that one obtains when using --replicate-* slave options. (It is not clear how much effort is needed to put the parser into mysqlbinlog. Any guesses?) 2.2 Use dumb regexp match ------------------------- Use a really dumb approach. A query is considered to be modifying table X if it matches an expression CREATE TABLE $tablename DROP $tablename UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' DELETE ...$tablename ... WHERE // same as above ALTER TABLE $tablename .. etc (go get from the grammar) .. The advantage over doing the same in awk is that mysqlbinlog will also process RBR statements, and together with that will provide a working solution for those who are careful with their table names not mixing with string constants and such. (TODO: string constants are of particular concern as they come from [potentially hostile] users, unlike e.g. table aliases which come from [not hostile] developers. Remove also all string constants before attempting to do match?) 2.3 Have the master put annotations ----------------------------------- We could add a master option so that it injects into query a mark that tells which tables the query will affect, e.g. for the query UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... the binlog will have /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. 2.4 Implement server functionality to ignore certain tables ----------------------------------------------------------- We could add a general facility in the server to ignore certain tables: SET SESSION ignored_tables = "db1.t1,db2.t2"; This would work similar to --replicate-ignore-table, but in a general way not restricted to the slave SQL thread. It would then be trivial for mysqlbinlog to add such statements at the start of the output, or probably the user could just do it manually with no need for additional options for mysqlbinlog. It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. 2.5 Extend Query Events With Tables Info ---------------------------------------- We could extend query events structure with a tables info - a list of tables which the query refers to: <current query event structure> tables_info_len dbase_len dbase table_len table ... dbase_len dbase table_len table Note. In case of <dbase> = current data base, we can set dbase_len = 0 and dbase = empty because current query event structure already includes current data base name. Note. Possibly it is reasonable also to add a --binlog-with-tables-info option which defines whether tables info must be included to the query events. LOW-LEVEL DESIGN: OPTION: 2.5 Extend Query Events With Tables Info ================================================ 1. Adding --binlog-with-tables-info option ****************************************** When set, Query events are to be written in the extended binary format which contains tables_info. When not set, Query events are to be written in usual format (without any changes). 2. Query event extended binary format ************************************* When --binlog-with-tables-info is set, Query events are writen to binary log in the following (extended) format. Query_log_event binary format --------------------------------- Name Size (bytes) --------------------------------- COMMON HEADER: timestamp 4 type 1 server_id 4 total_size 4 master_position 4 flags 2 --------------------------------- POST HEADER: slave_proxy_id 4 exec_time 4 db_len 1 error_code 2 status_vars_len 2 + query_len 2 (see Note 1) + tables_info_len 2 --------------------------------- BODY: status_vars status_vars_len - db db_len + 1 + db db_len (see Note 2) query query_len + tables_info tables_info binary format --------------------------------- Name Size (bytes) --------------------------------- db_len 1 (see Note 3) db db_len table_name_len 1 table_name table_name_len ... db_len 1 db db_len table_name_len 1 table_name table_name_len NOTES 1. In usual format, Query_log_event doesn't include 'query_len' because it considers the query to extent to the end of the event. 2. For 'db' (current db) the trailing zero is redundant since the length is already known. 3. db_len = 0 means that this is the current db. When reading Query events from binary log, we can recognize its format by its post-header length: in extended case the post-header includes 4 additional bytes. #define QUERY_HEADER_LEN (QUERY_HEADER_MINIMAL_LEN + 4) + #define QUERY_HEADER_LEN_EXT (QUERY_HEADER_LEN + 4) ... #define Q_STATUS_VARS_LEN_OFFSET 11 + #define Q_QUERY_LEN_OFFSET Q_STATUS_VARS_LEN_OFFSET + 2 + #define Q_QUERY_TABLES_INFO_LEN_OFFSET Q_QUERY_LEN_OFFSET + 2 3. Changes in log events ************************ 3.1. Format description event ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Changes needed here concern setting post-header length for Query events. This setting is done in Format description event constructor which creates the event for writing to binary log: if (binlog_with_tables_info) post_header_len[QUERY_EVENT - 1] = QUERY_HEADER_LEN_EXT; else post_header_len[QUERY_EVENT - 1] = QUERY_HEADER_LEN; This change is to be done only for case binlog_ver = 4. NOTE. The refered above constructor is allowed to be invoked in a client context for creating "artificial" Format description events in case of MySQL < 5.0 (e.g. see mysqlbinlog code). To avoid compilation problems (because of 'binlog_with_tables_info') and taking into account the "MySQL < 5.0" restriction, we have to #ifdef out the above code in following manner: switch (binlog_ver) { case 4: /* MySQL 5.0 and higher */ + #ifndef MYSQL_CLIENT ... + #else + <error> + #endif case 1: case 3: ... } 3.2. Query event ~~~~~~~~~~~~~~~~ Changes needed here include adding tables_info and tables_info_len members (member for query length already exists) and modifying the following function-members: Query_log_event(buf) constructor -------------------------------- [Parses binary format written to the 'buf'] Getting post-header length from the Format description event (passed to the constructor as an argument), define whether buf contains an extended or usual Query event and parse the buf contents accordingly. NOTE. Defining Query event format here should be done with taking into account that this constructor can be called within a Query-derived event with the event_type argument != QUERY_EVENT. Query_log_event(thd) constructor -------------------------------- [Creates the event for binlogging] In case of binlog_with_tables_info = TRUE, set additionally query_len, tables_info_len, and tables_info members (the constructor is to have an additional 'tables_info' argument). write() function ---------------- [Writes the event to binlog] In case of binlog_with_tables_info = TRUE, write additional members (query_len, tables_info_len, and tables_info) to binary log. Also write corresponding whole event length to the common-header. <To be continued> 4. Where to get tables info from? ********************************* 4.1. Case study: CREATE TABLE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *** CREATE TABLE table [SELECT ...] bool mysql_create_table_no_lock( THD *thd, const char *db, const char *table_name, ...) { ... // ------------------------------------- // WL40: To be included in tables_info: // * db, table_name // * thd->lex->query_tables (tables refered to in // the select-part; empty if no select-part) // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } *** CREATE TABLE table LIKE src-table bool mysql_create_like_table( ... TABLE_LIST *table, TABLE_LIST *src_table, ...) { ... if (thd->current_stmt_binlog_row_based) { // RBR: In this case we don't replicate temp tables if (!(create_info->options & HA_LEX_CREATE_TMP_TABLE)) { if (src_table->table->s->tmp_table) { // CREATE normal-table LIKE temp-table: // Generate new query without LIKE-part store_create_info(thd, table, &query, create_info, FALSE); // ------------------------------------- // WL40: To include to tables_info: // * table (src_table is not included) // ------------------------------------- write_bin_log(thd, TRUE, query.ptr(), query.length()); } else { // CREATE normal-table LIKE normal-table // ------------------------------------- // WL40: To include to log_tables_info: // * table // * src_table // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } } // CREATE temp-table LIKE ... // This case is not replicated } else { // SBR: // ------------------------------------- // WL40: To include to tables_info: // * table // * src_table // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } } <To be continued> ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Rev 2731: MWL#36: Add a mysqlbinlog option to change the used database in file:///home/psergey/dev/maria-5.2/
by Sergey Petrunya 04 Nov '09

04 Nov '09

At file:///home/psergey/dev/maria-5.2/ ------------------------------------------------------------ revno: 2731 revision-id: psergey(a)askmonty.org-20091104224158-nk2s2luvlqwa02bl parent: monty(a)askmonty.org-20091028221427-frz7593084t6osmc parent: alexi1952(a)yandex.ru-20091027134247-2efxqunerou2hykv committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.2 timestamp: Thu 2009-11-05 01:41:58 +0300 message: MWL#36: Add a mysqlbinlog option to change the used database - Merge to MariaDB 5.2 added: mysql-test/std_data/loaddata7.dat loaddata7.dat-20091016142045-b5fpf9tpydkmwy0i-1 mysql-test/suite/binlog/r/binlog_row_mysqlbinlog_options.result binlog_row_mysqlbinl-20091016142045-b5fpf9tpydkmwy0i-2 mysql-test/suite/binlog/t/binlog_row_mysqlbinlog_options.test binlog_row_mysqlbinl-20091016142045-b5fpf9tpydkmwy0i-3 modified: client/Makefile.am sp1f-makefile.am-19700101030959-zpaig4zjwqhmsob3ypilmrcrkq7pu6gf client/client_priv.h sp1f-client_priv.h-20010912205330-fzvv7eg77ywdut64ojoihwu3lhbabphc client/mysqlbinlog.cc sp1f-mysqlbinlog.cc-19700101030959-b3vgyo47ljent5mhbyj6ik33bi4bukad client/sql_string.cc sp1f-sql_string.cc-19700101030959-72h4johdz5mgcc6z2xceguohasr3pkrx client/sql_string.h sp1f-sql_string.h-19700101030959-oscd4vjvtuwjf22oosfpfmvc2f3km2dx sql/log_event.cc sp1f-log_event.cc-19700101030959-msmqlflsngxosswid2hpzxly5vfqdddc sql/log_event.h sp1f-log_event.h-19700101030959-clq6ett55tcqbpys2i4cpfrdccq7j4om sql/mysql_priv.h sp1f-mysql_priv.h-19700101030959-4fl65tqpop5zfgxaxkqotu2fa2ree5ci sql/mysqld.cc sp1f-mysqld.cc-19700101030959-zpswdvekpvixxzxf7gdtofzel7nywtfj sql/rpl_filter.cc sp1f-table_filter.cc-20050308201116-4anzb26smj76r56ihkpxzbtnzlzatr2k sql/rpl_filter.h sp1f-table_filter.h-20050308201116-ynbrrxlqijcoxfwgnipwjhgfw7vhdytw sql/sql_string.cc sp1f-sql_string.cc-19700101030959-yg6a37s5vnlpgxanuou6h2vbdmwbdxdx sql/sql_string.h sp1f-sql_string.h-19700101030959-nogsoxwv222e4vm4toic7cabeb4swajz sql/thr_malloc.cc sp1f-thr_malloc.cc-19700101030959-sg5hfbdswjtjtdjtmvppuc2eedch3p3s ------------------------------------------------------------ revno: 2643.51.10 revision-id: alexi1952(a)yandex.ru-20091027134247-2efxqunerou2hykv parent: psergey(a)askmonty.org-20091024214858-5whjzap4t0qvv41j committer: Alexander Ivanov <alexi1952(a)yandex.ru> branch nick: maria-wl36 timestamp: Tue 2009-10-27 16:42:47 +0300 message: MWL#36: Add a mysqlbinlog option to change the used database. Post-review changes: * modify mysqlbinlog's --help message for --rewrite-db option. * add test for mysqlbinlog with --read-from-remote-server option. modified: client/mysqlbinlog.cc sp1f-mysqlbinlog.cc-19700101030959-b3vgyo47ljent5mhbyj6ik33bi4bukad mysql-test/suite/binlog/r/binlog_row_mysqlbinlog_options.result binlog_row_mysqlbinl-20091016142045-b5fpf9tpydkmwy0i-2 mysql-test/suite/binlog/t/binlog_row_mysqlbinlog_options.test binlog_row_mysqlbinl-20091016142045-b5fpf9tpydkmwy0i-3 ------------------------------------------------------------ revno: 2643.51.9 revision-id: psergey(a)askmonty.org-20091024214858-5whjzap4t0qvv41j parent: psergey(a)askmonty.org-20091024194339-tauuslposi16t7uv committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-wl36-r3 timestamp: Sun 2009-10-25 01:48:58 +0400 message: MWL#36: Add a mysqlbinlog option to change the used database - Apply Alexander Ivanov's fix to testsuite modified: mysql-test/suite/binlog/r/binlog_row_mysqlbinlog_options.result binlog_row_mysqlbinl-20091016142045-b5fpf9tpydkmwy0i-2 mysql-test/suite/binlog/t/binlog_row_mysqlbinlog_options.test binlog_row_mysqlbinl-20091016142045-b5fpf9tpydkmwy0i-3 ------------------------------------------------------------ revno: 2643.51.8 revision-id: psergey(a)askmonty.org-20091024194339-tauuslposi16t7uv parent: alexi1952(a)yandex.ru-20091017061257-v142dpem8x7rrh35 committer: Sergey Petrunya <psergey(a)askmonty.org> branch nick: maria-5.1-wl36-r3 timestamp: Sat 2009-10-24 23:43:39 +0400 message: MWL#36: Add a mysqlbinlog option to change the used database - Review fixes modified: client/Makefile.am sp1f-makefile.am-19700101030959-zpaig4zjwqhmsob3ypilmrcrkq7pu6gf client/mysqlbinlog.cc sp1f-mysqlbinlog.cc-19700101030959-b3vgyo47ljent5mhbyj6ik33bi4bukad client/sql_string.h sp1f-sql_string.h-19700101030959-oscd4vjvtuwjf22oosfpfmvc2f3km2dx sql/log_event.cc sp1f-log_event.cc-19700101030959-msmqlflsngxosswid2hpzxly5vfqdddc sql/log_event.h sp1f-log_event.h-19700101030959-clq6ett55tcqbpys2i4cpfrdccq7j4om sql/mysqld.cc sp1f-mysqld.cc-19700101030959-zpswdvekpvixxzxf7gdtofzel7nywtfj sql/rpl_filter.cc sp1f-table_filter.cc-20050308201116-4anzb26smj76r56ihkpxzbtnzlzatr2k sql/rpl_filter.h sp1f-table_filter.h-20050308201116-ynbrrxlqijcoxfwgnipwjhgfw7vhdytw sql/sql_string.cc sp1f-sql_string.cc-19700101030959-yg6a37s5vnlpgxanuou6h2vbdmwbdxdx sql/sql_string.h sp1f-sql_string.h-19700101030959-nogsoxwv222e4vm4toic7cabeb4swajz ------------------------------------------------------------ revno: 2643.51.7 revision-id: alexi1952(a)yandex.ru-20091017061257-v142dpem8x7rrh35 parent: alexi1952(a)yandex.ru-20091016150824-qs0h8jnskz26y56h committer: Alexander Ivanov <alexi1952(a)yandex.ru> branch nick: maria-wl36 timestamp: Sat 2009-10-17 10:12:57 +0400 message: After merge modified: sql/sql_string.cc sp1f-sql_string.cc-19700101030959-yg6a37s5vnlpgxanuou6h2vbdmwbdxdx ------------------------------------------------------------ revno: 2643.51.6 revision-id: alexi1952(a)yandex.ru-20091016150824-qs0h8jnskz26y56h parent: alexi1952(a)yandex.ru-20091016142256-ylt3xpab2qhgs8hw parent: monty(a)askmonty.org-20091014080956-d6xr2v3glk4v53sg committer: Alexander Ivanov <alexi1952(a)yandex.ru> branch nick: maria-wl36 timestamp: Fri 2009-10-16 19:08:24 +0400 message: MWL#36: Add a mysqlbinlog option to change the used database. Merge from trunk. added: BUILD/compile-pentium-debug-max-no-qc compilepentiumdebugm-20090923065542-7a3x9z3bgfxn1rac-1 mysql-test/r/mysqltest_ps.result mysqltest_ps.result-20091009074042-4ptmxutygql3xe53-1 mysql-test/suite/maria/t/maria_showlog_error-master.opt maria_showlog_errorm-20091013124404-b9hjoj2o0dtvoi0v-1 mysql-test/t/mysqltest_ps.test mysqltest_ps.test-20091009074042-4ptmxutygql3xe53-2 win/configure-mariadb.sh configuremariadb.sh-20091008190333-1t6vp376gcf6dk1x-1 win/make_mariadb_win_dist make_mariadb_win_dis-20091011101226-jxt5k56vsnmdwseb-1 modified: BUILD/SETUP.sh sp1f-setup.sh-20001218212418-itvzddls4bsqffggcsjklbawdmaxdhde Makefile.am sp1f-makefile.am-19700101030959-jbbpiygwpgybyqknlavdxxupbrjonu7h client/mysqltest.cc sp1f-mysqltest.c-20001010065317-ix4zw26srlev7yugcz455ux22zwyynyf configure.in sp1f-configure.in-19700101030959-mgdpoxtnh2ewmvusvfpkreuhwvffkcjw mysql-test/include/maria_empty_logs.inc sp1f-maria_empty_logs.inc-20071113161227-ndxpipzwp33lgtq7hd2ztaenz7talfmb mysql-test/include/read_many_rows.inc sp1f-innodbbig.test-20050407162409-2ijhhplhkad7ibufmxakqxeassbgr3u7 mysql-test/include/wait_for_status_var.inc wait_for_status_var.-20090403213228-jhk85sfwqs7rva2t-1 mysql-test/lib/My/ConfigFactory.pm sp1f-configfactory.pm-20071212171904-umibosyolpj2kzgk32rt5p6pl6vztmaq mysql-test/lib/mtr_cases.pm sp1f-mtr_cases.pl-20050203205008-rrteoawyobvgq6u7zeyce4tmuu334ayg mysql-test/mysql-test-run.pl sp1f-mysqltestrun.pl-20041230152716-xjnn5ndv4rr4by6ijmj5a4ysubxc7qh3 mysql-test/r/read_many_rows_innodb.result sp1f-innodbbig.result-20050407162408-3byrsilm6v4jwdzg6hckv4gvderpmwrz mysql-test/suite/funcs_1/datadict/processlist_priv.inc sp1f-processlist_priv.inc-20070815194641-da67ub26jjvt6hw6xorwk7twwxj6wofg mysql-test/suite/funcs_1/datadict/processlist_val.inc sp1f-processlist_val.inc-20070815194641-3hfsmyh3jr2gjhvgms52iydpkamdbnoz mysql-test/suite/funcs_1/r/is_columns_is.result sp1f-is_columns_is.result-20080307163304-7bd6seaxklddmff6f3bb54inlyw6unpw mysql-test/suite/funcs_1/r/is_tables_is.result sp1f-is_tables_is.result-20080307163304-6xl5vbegso6wet3dzoehxb645vntxpig mysql-test/suite/funcs_1/r/is_tables_myisam.result sp1f-is_tables_myisam.res-20080307163304-e4a7cotuvanfvlzekx7hbwyhb5f2gk73 mysql-test/suite/funcs_1/r/processlist_priv_no_prot.result sp1f-a_processlist_priv_n-20070815194641-dlvb2zym4djihvxmbsmib5bvpgytiuoe mysql-test/suite/funcs_1/r/processlist_val_no_prot.result sp1f-a_processlist_val_no-20070815194641-puz2vi6caqirb4nqzumushorqyuniraq mysql-test/suite/funcs_1/t/is_columns_is.test sp1f-is_columns_is.test-20080307163303-projhl5aviz6wflvnkcoukmxy6nra32g mysql-test/suite/maria/t/maria-recover.test mariarecover.test-20080602174033-rnr5wg8wn2bqarwk-1 mysql-test/suite/parts/t/partition_alter1_1_2_innodb.test sp1f-partition_alter1_1_2-20080513231047-73rswweuq6oir4o2u6svrq3sihowexyn mysql-test/suite/parts/t/partition_alter1_1_2_myisam.test sp1f-partition_alter1_1_2-20080513231047-bc3pgqtbhdmv3e6rhbbdit2nvwvry6dw mysql-test/suite/parts/t/partition_alter1_1_innodb.test sp1f-partition_alter1_1_i-20080513231047-3ir4v6ppntt2eadlbomo4hbwfe5sauay mysql-test/suite/parts/t/partition_alter1_1_myisam.test sp1f-partition_alter1_1_m-20080513231047-rgvbdni4nugq5zraq7vneewnlbdqdz5r mysql-test/suite/parts/t/partition_alter1_2_innodb.test sp1f-partition_alter1_2_i-20080513231047-g6yx4e6tufcbky2wq463xgngz66cuqnu mysql-test/suite/parts/t/partition_alter1_2_myisam.test sp1f-partition_alter1_2_m-20080513231047-rij737fg6r5v64noymxivu75d5ngreqx mysql-test/suite/parts/t/partition_alter2_1_innodb.test sp1f-partition_alter2_inn-20070206122238-btax2l7djjymz6hlnkmuvwdt2nbqbxed mysql-test/suite/parts/t/partition_alter2_1_maria.test partition_alter2_1_m-20081121141924-3a12t18v91jq9g3g-1 mysql-test/suite/parts/t/partition_alter2_1_myisam.test sp1f-partition_alter2_myi-20070206122238-yz7u7rjz5kz6n4vjdn7s5t7kxnzkkamf mysql-test/suite/parts/t/partition_alter2_2_innodb.test partition_alter2_2_i-20080908140941-884mge0s10lxgki2-3 mysql-test/suite/parts/t/partition_alter2_2_maria.test partition_alter2_2_m-20081121141924-3a12t18v91jq9g3g-2 mysql-test/suite/parts/t/partition_alter2_2_myisam.test partition_alter2_2_m-20080908140941-884mge0s10lxgki2-4 mysql-test/suite/parts/t/partition_alter3_innodb.test sp1f-partition_alter3_inn-20070206122238-i52aup4vehr4p7jhfzhvlhwb2m6txoca mysql-test/suite/parts/t/partition_alter3_myisam.test sp1f-partition_alter3_myi-20070206122238-pfd4wa7cixsoxjigbqeqlsamagdt64ff mysql-test/suite/parts/t/partition_alter4_innodb.test sp1f-partition_alter4_inn-20070206122238-wzdnxx4co5gutrm6mnbees2dkz7bwsii mysql-test/suite/parts/t/partition_alter4_myisam.test sp1f-partition_alter4_myi-20070206122238-54kwwhogzz65yuo74mwin2d5fe5rki2n mysql-test/suite/parts/t/partition_basic_innodb.test sp1f-partition_basic_inno-20070206122238-6o7mtftdh7bsyo3up576z5ng45jkkhed mysql-test/suite/parts/t/partition_basic_myisam.test sp1f-partition_basic_myis-20070206122238-w5z7ioipltrbhl6y4yt63ueuuf3zrv46 mysql-test/suite/parts/t/partition_basic_symlink_myisam.test sp1f-partition_basic_syml-20080206141352-td2zc3ehbfsoon3e6nfwhq5lrpe7s3mt mysql-test/suite/parts/t/partition_engine_innodb.test sp1f-partition_engine_inn-20070206122238-qfeexchp6zfhwipmml3m64jx334k7vj3 mysql-test/suite/parts/t/partition_engine_myisam.test sp1f-partition_engine_myi-20070206122238-zx4jgy2vyr2u3svn7ktw3xzvl72wyck6 mysql-test/suite/parts/t/partition_recover_myisam.test partition_repair_myi-20080609121315-mjya2e9ekn7bunzm-2 mysql-test/suite/parts/t/partition_special_innodb.test sp1f-partition_special_in-20070206122239-ric3u7bjjmi6nzboyfzdbfiepzkx6i7k mysql-test/suite/parts/t/partition_syntax_innodb.test sp1f-partition_syntax_inn-20070206122240-nflwppyghr62grue7rxvideeq3pt5lpb mysql-test/suite/parts/t/partition_syntax_myisam.test sp1f-partition_syntax_myi-20070206122240-gvuy3nivxgp6biwnztskv3gvsji3fzl3 mysql-test/suite/pbxt/r/connect.result connect.result-20090402100035-4ilk9i91sh65vjcb-21 mysql-test/suite/pbxt/r/mysqlslap.result mysqlslap.result-20090402100035-4ilk9i91sh65vjcb-102 mysql-test/suite/pbxt/r/ps_11bugs.result ps_11bugs.result-20090402100035-4ilk9i91sh65vjcb-129 mysql-test/suite/pbxt/r/select.result select.result-20090402100035-4ilk9i91sh65vjcb-139 mysql-test/suite/pbxt/r/union.result union.result-20090402100035-4ilk9i91sh65vjcb-166 mysql-test/suite/pbxt/t/connect.test connect.test-20090402100035-4ilk9i91sh65vjcb-188 mysql-test/t/variables-big.test sp1f-variablesbig.test-20070412220324-cmxsynsvdpjgbtnnanbt6pyrzz3xrnwo scripts/make_win_bin_dist sp1f-make_win_bin_dist-20060901123056-xnusgszvkfrrcxkqidb7zszax2ezpyto sql/mysql_priv.h sp1f-mysql_priv.h-19700101030959-4fl65tqpop5zfgxaxkqotu2fa2ree5ci sql/mysqld.cc sp1f-mysqld.cc-19700101030959-zpswdvekpvixxzxf7gdtofzel7nywtfj sql/sql_string.cc sp1f-sql_string.cc-19700101030959-yg6a37s5vnlpgxanuou6h2vbdmwbdxdx storage/maria/ha_maria.cc sp1f-ha_maria.cc-20060411134405-dmngb4v5x5fxlxhff527ud3etiutxuxk storage/maria/ma_state.c sp1f-ma_state.c-20080529153331-ttwxiq5ksyib6sdrdsdl2lnbbm362lwh storage/maria/maria_def.h sp1f-maria_def.h-20060411134454-urdx4joxwcwzxbmltpzejn53y2rgjs44 storage/myisam/ha_myisam.cc sp1f-ha_myisam.cc-19700101030959-7xzssylbn7zfz3nupnsw43wws6xlltsu storage/myisam/mi_locking.c sp1f-mi_locking.c-19700101030959-27f7n7juexzv4qrhkiwcoylzh4yfibbl storage/myisam/myisamdef.h sp1f-myisamdef.h-19700101030959-fzrxvpmzhzqfn5w2clasmcw7af4kanoa storage/pbxt/src/Makefile.am makefile.am-20090326121724-x683v32twzr3fi0y-13 win/Makefile.am sp1f-makefile.am-20060324215048-nfows6kg5gqazyzq4qztdyzddvzjmrfm win/configure.js sp1f-configure.js-20060131135210-xvfnytwaxztc3ytr6pmdtutht4i26rdu ------------------------------------------------------------ revno: 2643.51.5 revision-id: alexi1952(a)yandex.ru-20091016142256-ylt3xpab2qhgs8hw parent: alexi1952(a)yandex.ru-20091016140412-90gld44oqp6bv9cs committer: Alexander Ivanov <alexi1952(a)yandex.ru> branch nick: maria-wl36 timestamp: Fri 2009-10-16 18:22:56 +0400 message: MWL#36: Add a mysqlbinlog option to change the used database. Add test-suite. added: mysql-test/std_data/loaddata7.dat loaddata7.dat-20091016142045-b5fpf9tpydkmwy0i-1 mysql-test/suite/binlog/r/binlog_row_mysqlbinlog_options.result binlog_row_mysqlbinl-20091016142045-b5fpf9tpydkmwy0i-2 mysql-test/suite/binlog/t/binlog_row_mysqlbinlog_options.test binlog_row_mysqlbinl-20091016142045-b5fpf9tpydkmwy0i-3 ------------------------------------------------------------ revno: 2643.51.4 revision-id: alexi1952(a)yandex.ru-20091016140412-90gld44oqp6bv9cs parent: alexi1952(a)yandex.ru-20091016132813-yingxxg1m0lcb61p committer: Alexander Ivanov <alexi1952(a)yandex.ru> branch nick: maria-wl36 timestamp: Fri 2009-10-16 18:04:12 +0400 message: MWL#36: Add a mysqlbinlog option to change the used database. Add --rewrite-db option to mysqlbinlog. modified: client/client_priv.h sp1f-client_priv.h-20010912205330-fzvv7eg77ywdut64ojoihwu3lhbabphc client/mysqlbinlog.cc sp1f-mysqlbinlog.cc-19700101030959-b3vgyo47ljent5mhbyj6ik33bi4bukad ------------------------------------------------------------ revno: 2643.51.3 revision-id: alexi1952(a)yandex.ru-20091016132813-yingxxg1m0lcb61p parent: alexi1952(a)yandex.ru-20091016115816-tj67uvdvqrt8f6fz committer: Alexander Ivanov <alexi1952(a)yandex.ru> branch nick: maria-wl36 timestamp: Fri 2009-10-16 17:28:13 +0400 message: MWL#36: Add a mysqlbinlog option to change the used database. Add Rpl_filter to mysqlbinlog.cc Note. Though within MWL#36 we are going to use only two Rpl_filter's methods (add_db_rewrite and get_rewrite_db), we look forward for MWL#40 where Rpl_filter is likely to be used to its more extent. Note. Within MWL#36 we will not use Rpl_filter for supporting --database option: this option allows to specify only one database what doesn't correlate with Rpl_filter::add_do_db() (using this method will either appear "artificial" or require changing --database semantics). To be discussed within MWL#40. To add Rpl_filter we need: 1. include sql_string.h There are two instances of sql_string.* files - in sql and in client directories. We need to use the ones from the sql dir. 2. include sql_list.h This requires to define a client version of sql_alloc() function. 3. include rpl_filter.h This requires a definition of system_charset_info variable. Besides, Rpl_filter::tables_ok() refers to a TABLE_LIST structure which encounts deep non-client dependencies and can't be used here as is. On the other hand, tables_ok() make use only few TABLE_LIST's members and none of them depends on specific server context. This allows to redefine TABLE_LIST in a client context so that tables_ok() becomes admissible (surely it's a kind of hack but (at least currently) it's better than #ifndef'ing this method in Rpl_filter definition). Also add Rpl_filter::rewrite_db_is_empty() method. This is needed to be able to check that --rewrite-db is not used jointly with --base64-output= always (this is not supported - at least currently). modified: client/mysqlbinlog.cc sp1f-mysqlbinlog.cc-19700101030959-b3vgyo47ljent5mhbyj6ik33bi4bukad sql/rpl_filter.cc sp1f-table_filter.cc-20050308201116-4anzb26smj76r56ihkpxzbtnzlzatr2k sql/rpl_filter.h sp1f-table_filter.h-20050308201116-ynbrrxlqijcoxfwgnipwjhgfw7vhdytw ------------------------------------------------------------ revno: 2643.51.2 revision-id: alexi1952(a)yandex.ru-20091016115816-tj67uvdvqrt8f6fz parent: alexi1952(a)yandex.ru-20091016112009-il1xp74ksbm7i9ux committer: Alexander Ivanov <alexi1952(a)yandex.ru> branch nick: maria-wl36 timestamp: Fri 2009-10-16 15:58:16 +0400 message: MWL#36: Add a mysqlbinlog option to change the used database. Add rewrite_db() member to the Table_map_log_event class. Each RBR-event in binary log is preceded by a Table_map event containing a (db_id, db_name) pair. The rewrite_db(new_name) function replaces db_name by new_name in a buffer containing Table_map event read from the binary log. modified: sql/log_event.cc sp1f-log_event.cc-19700101030959-msmqlflsngxosswid2hpzxly5vfqdddc sql/log_event.h sp1f-log_event.h-19700101030959-clq6ett55tcqbpys2i4cpfrdccq7j4om ------------------------------------------------------------ revno: 2643.51.1 revision-id: alexi1952(a)yandex.ru-20091016112009-il1xp74ksbm7i9ux parent: knielsen(a)knielsen-hq.org-20091006183755-y08lid6fg8mnr8sk committer: Alexander Ivanov <alexi1952(a)yandex.ru> branch nick: maria-wl36 timestamp: Fri 2009-10-16 15:20:09 +0400 message: MWL#36: Add a mysqlbinlog option to change the used database. Make sql_alloc() declaration "public" for a client context. The reason is that sql_alloc() is used in definition of some common purpose stuff (e.g. sql_list.*). To make this stuff available for a client context we declare sql_alloc() as a "virtual function", i.e. as a function that is already declared but must be defined in this context (note that definition of sql_alloc() in thr_malloc.cc is #ifndef'ed for MYSQL_CLIENT). Also make sql_string.h repeatedly includable. modified: client/sql_string.cc sp1f-sql_string.cc-19700101030959-72h4johdz5mgcc6z2xceguohasr3pkrx sql/mysql_priv.h sp1f-mysql_priv.h-19700101030959-4fl65tqpop5zfgxaxkqotu2fa2ree5ci sql/sql_string.cc sp1f-sql_string.cc-19700101030959-yg6a37s5vnlpgxanuou6h2vbdmwbdxdx sql/sql_string.h sp1f-sql_string.h-19700101030959-nogsoxwv222e4vm4toic7cabeb4swajz sql/thr_malloc.cc sp1f-thr_malloc.cc-19700101030959-sg5hfbdswjtjtdjtmvppuc2eedch3p3s Diff too large for email (1159 lines, the limit is 1000).

1 0

[Maria-developers] MWL#24: Review feedback, first batch
by Sergey Petrunya 04 Nov '09

04 Nov '09

Hello Igor, Please find below the first portion of MWL#24 review feedback below. More to follow. > diff -urN --exclude='.*' maria-5.1-wl24-clean2/sql/opt_range.cc maria-5.1-wl24-nocompile/sql/opt_range.cc > --- maria-5.1-wl24-clean2/sql/opt_range.cc 2009-11-03 00:18:22.000000000 +0300 > +++ maria-5.1-wl24-nocompile/sql/opt_range.cc 2009-11-03 00:12:29.000000000 +0300 ... > @@ -809,96 +940,294 @@ > > > /* > - Perform OR operation on this SEL_IMERGE and supplied SEL_TREE new_tree, > - combining new_tree with one of the trees in this SEL_IMERGE if they both > - have SEL_ARGs for the same key. > + Check if any of the range trees of this imerge intersects with a given tree > > SYNOPSIS > - or_sel_tree_with_checks() > - param PARAM from SQL_SELECT::test_quick_select There is be no need for this parameter since sel_trees_have_common_keys() doesn't really need it. > - new_tree SEL_TREE with type KEY or KEY_SMALLER. > + have_common_keys() > + param Context info for the function > + tree SEL_TREE intersection with the imerge range trees is checked for Unfinished sentence? > > - NOTES > - This does the following: > - (t_1||...||t_k)||new_tree = > - either > - = (t_1||...||t_k||new_tree) > - or > - = (t_1||....||(t_j|| new_tree)||...||t_k), > + DESCRIPTION > + The function checks whether there is any range tree rt_i in this imerge > + such that there are some indexes for which ranges are defined in both > + rt_i and the range part of the SEL_TREE tree. > + To check this the function calls the function sel_trees_have_common_keys. > + > + RETURN > + TRUE if there are such range trees in this imerge > + FALSE otherwise > +*/ > + > +bool SEL_IMERGE::have_common_keys(RANGE_OPT_PARAM *param, SEL_TREE *tree) > +{ > + for (SEL_TREE** or_tree= trees, **bound= trees_next; > + or_tree != bound; or_tree++) > + { > + key_map common_keys; > + if (sel_trees_have_common_keys(param, *or_tree, tree, &common_keys)) > + return TRUE; > + } > + return FALSE; > +} > > - where t_i, y are SEL_TREEs. > - new_tree is combined with the first t_j it has a SEL_ARG on common > - key with. As a consequence of this, choice of keys to do index_merge > - read may depend on the order of conditions in WHERE part of the query. > > +/* > + Perform AND operation for this imerge and the range part of a tree > + > + SYNOPSIS > + and_sel_tree() > + param Context info for the operation > + tree SEL_TREE for the second operand of the operation > + new_imerge OUT imerge for the result of the operation > + > + DESCRIPTION > + This function performs AND operation for this imerge m and the > + range part of the SEL_TREE tree rt. In other words the function ^^^^ note that here it is 'rt' while further on it is 'RT'. > + pushes rt into this imerge. The resulting imerge is returned in > + the parameter new_imerge. > + If this imerge m represent the formula > + RT_1 OR ... OR RT_k > + then the resulting imerge of the function represents the formula > + (RT_1 AND RT) OR ... (RT_k AND RT) ^^^ please add 'OR' here. Overall, my personal opinion is that we should aim for more siccint comments, for example, for this function the comment could be: <comment> This function performs an AND operation between this SEL_IMERGE object and a given SEL_TREE object. It performs the following conversion: (rt_1 OR ... OR rt_k) AND rt -> (rt_1 AND rt) OR ... OR (rt_k AND rt) We call and_range_trees() function to construct a SEL_TREE for (rt_i AND rt). </comment> > + The function calls the function and_range_trees to construct the > + range tree representing (RT_i AND RT). imho there is not much merit in providing details like that (or I'm missing something important here) > + > + NOTE > + The function may return an empty imerge without any range trees while here it would be helpful to have another sentense explaining when this can happen and what does returning empty imerge means. (does it mean that there is no records that would match both given SEL_IMERGE and SEL_TREE?) > + > RETURN > - 0 OK > - 1 One of the trees was combined with new_tree to SEL_TREE::ALWAYS, > - and (*this) should be discarded. > - -1 An error occurred. > + 0 if the operation is a success > + -1 otherwise: there is not enough memory to perform the operation > */ > > -int SEL_IMERGE::or_sel_tree_with_checks(RANGE_OPT_PARAM *param, SEL_TREE *new_tree) > +int SEL_IMERGE::and_sel_tree(RANGE_OPT_PARAM *param, SEL_TREE *tree, > + SEL_IMERGE *new_imerge) If the function can only return two possible values, can its return type be changed to bool? > { > - for (SEL_TREE** tree = trees; > - tree != trees_next; > - tree++) > + for (SEL_TREE** or_tree= trees; or_tree != trees_next; or_tree++) > { > - if (sel_trees_can_be_ored(*tree, new_tree, param)) > + SEL_TREE *result= 0; The name 'result' is typically used to store the result of entire function. This function will typically produce multiple 'results', which is confusing. Please rename the variable. > + key_map result_keys; > + if (!(result= new SEL_TREE())) > + return (-1); > + if (!and_range_trees(param, *or_tree, tree, result)) > { > - *tree = tree_or(param, *tree, new_tree); > - if (!*tree) > - return 1; > - if (((*tree)->type == SEL_TREE::MAYBE) || > - ((*tree)->type == SEL_TREE::ALWAYS)) > + if (new_imerge->or_sel_tree(param, result)) > + return (-1); > + } > + } > + return 0; > +} > + > + > +/* > + Perform OR operation on this imerge and the range part of a tree > + > + SYNOPSIS > + or_sel_tree_with_checks() > + param Context info for the operation > + tree SEL_TREE whose range part is to be ored > + is_first_check_pass <=> the first call of the function for this imerge > + is_last_check_pass OUT <=> no more calls of the function for this imerge > + > + DESCRIPTION > + The functions performs OR operation on this imerge m and the range part "functions performs", bad grammar. Why denote the imerge as "m" if we don't use "m" later on? > + of the SEL_TREE tree rt. It always replaces this imerge with the result > + of the operation. > + > + The operation can be performed in two different modes: with > + is_first_check_pass==TRUE and is_first_check_pass==FALSE, transforming > + this imerge differently. > + > + Given this imerge represents the formula > + RT_1 OR ... OR RT_k: > + > + 1. In the first mode, when is_first_check_pass==TRUE : Is there really a need to be be so verbose, s/In the first mode,// ? > + 1.1. If rt must be ored(see the function sel_trees_must_be_ored) with Would it be shorter if "see the function sel_trees_must_be_ored" was changed to "see sel_trees_must_be_ored()"? > + some rt_j (there may be only one such range tree in the imerge) > + then the function produces the imerge representing the formula s/the imerge/an imerge/ ? > + RT_1 OR ... OR (RT_j OR RT) OR ... OR RT_k, > + where the tree for (RT_j OR RT) is built by oring the pairs s/oring/ORing/ ? > + of SEL_ARG trees for the corresponding indexes > + 1.2. Otherwise the function produces the imerge representing the formula: > + RT_1 OR ... OR RT_k OR RT. > + > + 2. In the second mode, when is_first_check_pass==FALSE : > + 2.1. For each rt_j in the imerge that can be ored (see the function > + sel_trees_can_be_ored), but not must be ored, with rt the function > + replaces rt_j for a range tree such that for each index for which > + ranges are defined in both in rt_j and rt the tree contains the > + result of oring of these ranges. > + 2.2. In other cases the function does not produce any imerge. > + > + When is_first_check==TRUE the function returns FALSE in the parameter > + is_last_check_pass if there is no rt_j such that rt_j can be ored with rt, > + but, at the same time, it's not true that rt_j must be ored with rt. > + When is_first_check==FALSE the function always returns FALSE in the > + parameter is_last_check_pass. > + > + RETURN > + 1 The result of oring of rt_j and rt that must be ored returns the > + the range tree with type==SEL_TREE::ALWAYS > + (in this case the imerge m should be discarded) > + -1 The function runs out of memory > + 0 in all other cases > +*/ > + > +int SEL_IMERGE::or_sel_tree_with_checks(RANGE_OPT_PARAM *param, > + SEL_TREE *tree, > + bool is_first_check_pass, > + bool *is_last_check_pass) > +{ > + *is_last_check_pass= TRUE; > + for (SEL_TREE** or_tree = trees; > + or_tree != trees_next; > + or_tree++) > + { > + SEL_TREE *result= 0; > + key_map result_keys; > + key_map ored_keys; > + if (sel_trees_can_be_ored(param, *or_tree, tree, &ored_keys)) > + { > + bool must_be_ored= sel_trees_must_be_ored(param, *or_tree, tree, > + ored_keys); > + if (must_be_ored == is_first_check_pass) { ^^^^^^ Coding style violation > + result_keys.clear_all(); > + result= *or_tree; > + for (uint key_no= 0; key_no < param->keys; key_no++) > + { > + if (!ored_keys.is_set(key_no)) > + { > + result->keys[key_no]= 0; > + continue; > + } > + SEL_ARG *key1= (*or_tree)->keys[key_no]; > + SEL_ARG *key2= tree->keys[key_no]; > + key2->incr_refs(); > + if ((result->keys[key_no]= key_or(param, key1, key2))) > + { > + > + result_keys.set_bit(key_no); > +#ifdef EXTRA_DEBUG > + if (param->alloced_sel_args < SEL_ARG::MAX_SEL_ARGS) > + { > + key1= result->keys[key_no]; > + (key1)->test_use_count(key1); > + } > +#endif > + } > + } > + } > + else if(is_first_check_pass) > + *is_last_check_pass= FALSE; > + } > + > + if (result) > + { > + if (result_keys.is_clear_all()) > + result->type= SEL_TREE::ALWAYS; > + *is_last_check_pass= TRUE; > + if ((result->type == SEL_TREE::MAYBE) || > + (result->type == SEL_TREE::ALWAYS)) > return 1; > /* SEL_TREE::IMPOSSIBLE is impossible here */ > - return 0; > + result->keys_map= result_keys; > + *or_tree= result; > + if (is_first_check_pass) > + return 0; > } > } > + if (!is_first_check_pass) > + return 0; > > - /* New tree cannot be combined with any of existing trees. */ > - return or_sel_tree(param, new_tree); > + if (!*is_last_check_pass && > + !(tree= new SEL_TREE(tree, FALSE, param))) > + return (-1); > + return or_sel_tree(param, tree); > } > > > /* > - Perform OR operation on this index_merge and supplied index_merge list. > + Perform OR operation on this imerge and and another imerge > + > + SYNOPSIS > + or_sel_tree_with_checks() > + param Context info for the operation > + imerge The second operand of the operation > + is_first_check_pass <=> the first call of the function for this imerge > + is_last_check_pass OUT <=> no more calls of the function for this imerge > > + DESCRIPTION > + For each range tree rt from 'imerge' the function calls the method > + SEL_IMERGE::or_sel_tree_with_checks that performs OR operation on this > + SEL_IMERGE object m and the tree rt. The mode of the operation is > + specified by the parameter is_first_check_pass. Each call of > + SEL_IMERGE::or_sel_tree_with_checks transforms this SEL_IMERGE object m. > + The function returns FALSE in the prameter is_last_check_pass if > + at least one of the calls of SEL_IMERGE::or_sel_tree_with_checks > + returns FALSE as the value of its last parameter. > + > RETURN > - 0 - OK > - 1 - One of conditions in result is always TRUE and this SEL_IMERGE > - should be discarded. > - -1 - An error occurred > + 1 One of the calls of SEL_IMERGE::or_sel_tree_with_checks returns 1. > + (in this case the imerge m should be discarded) > + -1 The function runs out of memory > + 0 in all other cases > */ > > -int SEL_IMERGE::or_sel_imerge_with_checks(RANGE_OPT_PARAM *param, SEL_IMERGE* imerge) > +int SEL_IMERGE::or_sel_imerge_with_checks(RANGE_OPT_PARAM *param, > + SEL_IMERGE* imerge, > + bool is_first_check_pass, > + bool *is_last_check_pass) > { > + *is_last_check_pass= TRUE; > for (SEL_TREE** tree= imerge->trees; > tree != imerge->trees_next; > tree++) > { > - if (or_sel_tree_with_checks(param, *tree)) > - return 1; > + uint rc; > + bool is_last; > + rc= or_sel_tree_with_checks(param, *tree, is_first_check_pass, &is_last); > + if (!is_last) > + *is_last_check_pass= FALSE; > + if (rc) > + return rc; > } > return 0; > } > > > -SEL_TREE::SEL_TREE(SEL_TREE *arg, RANGE_OPT_PARAM *param): Sql_alloc() > +/* > + Copy constructor for SEL_TREE objects > + > + SYNOPSIS > + SEL_TREE > + arg The source tree for the constructor > + without_merges <=> only the range part of the tree arg is copied > + param Context info for the operation > + > + DESCRIPTION > + The constructor creates a full copy of the SEL_TREE arg if > + the prameter without_merges==FALSE. Otherwise a tree is created > + that contains the copy only of the range part of the tree arg. > +*/ > + > +SEL_TREE::SEL_TREE(SEL_TREE *arg, bool without_merges, > + RANGE_OPT_PARAM *param): Sql_alloc() > { > keys_map= arg->keys_map; > type= arg->type; > - for (int idx= 0; idx < MAX_KEY; idx++) > + for (uint idx= 0; idx < param->keys; idx++) > { > if ((keys[idx]= arg->keys[idx])) > - keys[idx]->increment_use_count(1); > + keys[idx]->incr_refs(); > } > > + if (without_merges) > + return; > + > List_iterator<SEL_IMERGE> it(arg->merges); > for (SEL_IMERGE *el= it++; el; el= it++) > { > - SEL_IMERGE *merge= new SEL_IMERGE(el, param); > + SEL_IMERGE *merge= new SEL_IMERGE(el, 0, param); > if (!merge || merge->trees == merge->trees_next) > { > merges.empty(); > @@ -909,7 +1238,23 @@ > } > > > -SEL_IMERGE::SEL_IMERGE (SEL_IMERGE *arg, RANGE_OPT_PARAM *param) : Sql_alloc() > +/* > + Copy constructor for SEL_IMERGE objects > + > + SYNOPSIS > + SEL_IMERGE > + arg The source imerge for the constructor > + cnt How many trees from arg are to be copied > + param Context info for the operation > + > + DESCRIPTION > + The cnt==0 then the constructor creates a full copy of the > + imerge arg. Otherwise only the first cnt trees of the imerge > + are copied. > +*/ > + > +SEL_IMERGE::SEL_IMERGE(SEL_IMERGE *arg, uint cnt, > + RANGE_OPT_PARAM *param) : Sql_alloc() > { > uint elements= (arg->trees_end - arg->trees); > if (elements > PREALLOCED_TREES) > @@ -921,13 +1266,13 @@ > else > trees= &trees_prealloced[0]; > > - trees_next= trees; > + trees_next= trees + (cnt ? cnt : arg->trees_next-arg->trees); > trees_end= trees + elements; > > - for (SEL_TREE **tree = trees, **arg_tree= arg->trees; tree < trees_end; > + for (SEL_TREE **tree = trees, **arg_tree= arg->trees; tree < trees_next; > tree++, arg_tree++) > { > - if (!(*tree= new SEL_TREE(*arg_tree, param))) > + if (!(*tree= new SEL_TREE(*arg_tree, FALSE, param))) > goto mem_err; > } > > @@ -941,7 +1286,19 @@ > > > /* > - Perform AND operation on two index_merge lists and store result in *im1. > + Perform AND operation on two imerge lists > + > + SYNOPSIS > + imerge_list_and_list() > + param Context info for the operation > + im1 The first imerge list for the operation > + im2 The second imerge list for the operation > + > + DESCRIPTION > + The function just appends the imerge list im2 to the imerge list im1 > + > + RETURN VALUE > + none > */ > > inline void imerge_list_and_list(List<SEL_IMERGE> *im1, List<SEL_IMERGE> *im2) > @@ -951,73 +1308,242 @@ > > > /* > - Perform OR operation on 2 index_merge lists, storing result in first list. > - > - NOTES > - The following conversion is implemented: > - (a_1 &&...&& a_N)||(b_1 &&...&& b_K) = AND_i,j(a_i || b_j) => > - => (a_1||b_1). > - > - i.e. all conjuncts except the first one are currently dropped. > - This is done to avoid producing N*K ways to do index_merge. > + Perform OR operation on two imerge lists > > - If (a_1||b_1) produce a condition that is always TRUE, NULL is returned > - and index_merge is discarded (while it is actually possible to try > - harder). > - > - As a consequence of this, choice of keys to do index_merge read may depend > - on the order of conditions in WHERE part of the query. > + SYNOPSIS > + imerge_list_or_list() > + param Context info for the operation > + im1 The first imerge list for the operation > + im2 The second imerge list for the operation > + > + DESCRIPTION > + Assuming that the first imerge list represents the formula > + F1= M1_1 AND ... AND M1_k1 > + while the second imerge list represents the formula > + F2= M2_1 AND ... AND M2_k2, > + where M1_i= RT1_i_1 OR ... OR RT1_i_l1i (i in [1..k1]) > + and M2_i = RT2_i_1 OR ... OR RT2_i_l2i (i in [1..k2]), > + the function builds a list of imerges for some formula that can be > + inferred from the formula (F1 OR F2). > + > + More exactly the function builds imerges for the formula (M1_1 OR M2_1). > + Note that > + (F1 OR F2) = (M1_1 AND ... AND M1_k1) OR (M2_1 AND ... AND M2_k2) = > + AND (M1_i OR M2_j) (i in [1..k1], j in [1..k2]) => > + M1_1 OR M2_1. > + So (M1_1 OR M2_1) is indeed an inference formula for (F1 OR F2). > + > + To build imerges for the formula (M1_1 OR M2_1) the function invokes, > + possibly twice, the method SEL_IMERGE::or_sel_imerge_with_checks > + for the imerge m1_1. > + At its first invocation the method SEL_IMERGE::or_sel_imerge_with_checks > + performs OR operation on the imerge m1_1 and the range tree rt2_1_1 by > + calling SEL_IMERGE::or_sel_tree_with_checks with is_first_pass_check==TRUE. > + The resulting imerge of the operation is ored with the next range tree of > + the imerge m2_1. This oring continues until the last range tree from > + m2_1 has been ored. > + At its second invocation the method SEL_IMERGE::or_sel_imerge_with_checks > + performs the same sequence of OR operations, but now calling > + SEL_IMERGE::or_sel_tree_with_checks with is_first_pass_check==FALSE. > > + The imerges that the operation produces replace those in the list im1 > + > RETURN > - 0 OK, result is stored in *im1 > - other Error, both passed lists are unusable > + 0 if the operation is a success > + -1 if the function has run out of memory > */ > > int imerge_list_or_list(RANGE_OPT_PARAM *param, > List<SEL_IMERGE> *im1, > List<SEL_IMERGE> *im2) > { > + > + uint rc; > + bool is_last_check_pass= FALSE; > + > SEL_IMERGE *imerge= im1->head(); > + uint elems= imerge->trees_next-imerge->trees; > im1->empty(); > im1->push_back(imerge); > > - return imerge->or_sel_imerge_with_checks(param, im2->head()); > + rc= imerge->or_sel_imerge_with_checks(param, im2->head(), > + TRUE, &is_last_check_pass); > + if (rc) > + { > + if (rc == 1) > + { > + im1->empty(); > + rc= 0; > + } > + return rc; > + } > + > + if (!is_last_check_pass) > + { > + SEL_IMERGE* new_imerge= new SEL_IMERGE(imerge, elems, param); > + if (new_imerge) > + { > + is_last_check_pass= TRUE; > + rc= new_imerge->or_sel_imerge_with_checks(param, im2->head(), > + FALSE, &is_last_check_pass); > + if (!rc) > + im1->push_back(new_imerge); > + } > + } > + return rc; > } > > > /* > - Perform OR operation on index_merge list and key tree. > + Perform OR operation for each imerge from a list and the range part of a tree > + > + SYNOPSIS > + imerge_list_or_tree() > + param Context info for the operation > + merges The list of imerges to be ored with the range part of tree > + tree SEL_TREE whose range part is to be ored with the imerges > > + DESCRIPTION > + For each imerge mi from the list 'merges' the function performes OR > + operation with mi and the range part of 'tree' rt, producing one or > + two imerges. > + > + Given the merge mi represent the formula RTi_1 OR ... OR RTi_k, > + the function forms the merges by the following rules: > + > + 1. If rt cannot be ored with any of the trees rti the function just > + produces an imerge that represents the formula > + RTi_1 OR ... RTi_k OR RT. > + 2. If there exist a tree rtj that must be ored with rt the function > + produces an imerge the represents the formula > + RTi_1 OR ... OR (RTi_j OR RT) OR ... OR RTi_k, > + where the range tree for (RTi_j OR RT) is constructed by oring the > + SEL_ARG trees that must be ored. > + 3. For each rti_j that can be ored with rt the function produces > + the new tree rti_j' and substitutes rti_j for this new range tree. > + > + In any case the function removes mi from the list and then adds all > + produced imerges. > + > + To build imerges by rules 1-3 the function calls the method > + SEL_IMERGE::or_sel_tree_with_checks, possibly twice. With the first > + call it passes TRUE for the third parameter of the function. > + At this first call imerges by rules 1-2 are built. If the call > + returns FALSE as the return value of its fourth parameter then the > + function are called for the second time. At this call the imerge > + of rule 3 is produced. > + > + If a call of SEL_IMERGE::or_sel_tree_with_checks returns 1 then > + then it means that the produced tree contains an always true > + range tree and the whole imerge can be discarded. > + > RETURN > - 0 OK, result is stored in *im1. > - other Error > + 1 if no imerges are produced > + 0 otherwise > */ > > +static > int imerge_list_or_tree(RANGE_OPT_PARAM *param, > - List<SEL_IMERGE> *im1, > + List<SEL_IMERGE> *merges, > SEL_TREE *tree) > { > + > SEL_IMERGE *imerge; > - List_iterator<SEL_IMERGE> it(*im1); > - bool tree_used= FALSE; > + List<SEL_IMERGE> additional_merges; > + List_iterator<SEL_IMERGE> it(*merges); > + > while ((imerge= it++)) > { > - SEL_TREE *or_tree; > - if (tree_used) > - { > - or_tree= new SEL_TREE (tree, param); > - if (!or_tree || > - (or_tree->keys_map.is_clear_all() && or_tree->merges.is_empty())) > - return FALSE; > + bool is_last_check_pass; > + int rc= 0; > + int rc1= 0; > + SEL_TREE *or_tree= new SEL_TREE (tree, FALSE, param); > + if (or_tree) > + { > + uint elems= imerge->trees_next-imerge->trees; > + rc= imerge->or_sel_tree_with_checks(param, or_tree, > + TRUE, &is_last_check_pass); > + if (!is_last_check_pass) > + { > + SEL_IMERGE *new_imerge= new SEL_IMERGE(imerge, elems, param); > + if (new_imerge) > + { > + rc1= new_imerge->or_sel_tree_with_checks(param, or_tree, > + FALSE, &is_last_check_pass); > + if (!rc1) > + additional_merges.push_back(new_imerge); > + } > + } > } > - else > - or_tree= tree; > - > - if (imerge->or_sel_tree_with_checks(param, or_tree)) > + if (rc || rc1 || !or_tree) > it.remove(); > - tree_used= TRUE; > } > - return im1->is_empty(); > + > + merges->concat(&additional_merges); > + return merges->is_empty(); > +} > + > + > +/* > + Perform pushdown operation of the range part of a tree into given imerges > + > + SYNOPSIS > + imerge_list_and_tree() > + param Context info for the operation > + merges IN/OUT List of imerges to push the range part of 'tree' into > + tree SEL_TREE whose range part is to be pushed into imerges > + > + DESCRIPTION > + For each imerge from the list merges the function pushes the range part > + rt of 'tree' into the imerge. > + More exactly if the imerge mi from the list represents the formula > + RTi_1 OR ... OR RTi_k > + the function bulds a new imerge that represents the formula > + (RTi_1 AND RT) OR ... OR (RTi_k AND RT) Where the "RT" is? I think the comment would be much easier to read if the conversion was explicitly specified. > + and adds this imerge to the list merges. > + To perform this pushdown operation the function calls the method > + SEL_IMERGE::and_sel_tree. > + For any imerge mi the new imerge is not created if for each pair of > + trees rti_j and rt the intersection of the indexes with defined ranges > + is empty. > + If the result of the pushdown operation for the imerge mi returns an > + imerge with no trees then then not only nothing is added to the list > + merges but mi itself is removed from the list. > + > + RETURN > + 1 if no imerges are left in the list merges > + 0 otherwise > +*/ > + > +static > +int imerge_list_and_tree(RANGE_OPT_PARAM *param, > + List<SEL_IMERGE> *merges, > + SEL_TREE *tree) > +{ > + SEL_IMERGE *imerge; > + SEL_IMERGE *new_imerge= NULL; > + List<SEL_IMERGE> new_merges; > + List_iterator<SEL_IMERGE> it(*merges); > + > + while ((imerge= it++)) > + { > + if (!new_imerge) > + new_imerge= new SEL_IMERGE(); > + if (imerge->have_common_keys(param, tree) && > + new_imerge && !imerge->and_sel_tree(param, tree, new_imerge)) > + { > + if (new_imerge->trees == new_imerge->trees_next) > + it.remove(); > + else > + { > + new_merges.push_back(new_imerge); > + new_imerge= NULL; > + } > + } > + } > + imerge_list_and_list(&new_merges, merges); > + *merges= new_merges; > + return merges->is_empty(); > } > > > @@ -1651,6 +2177,7 @@ > min_value=arg.min_value; > max_value=arg.max_value; > next_key_part=arg.next_key_part; > + max_part_no= arg.max_part_no; > use_count=1; elements=1; > } > > @@ -1668,9 +2195,10 @@ > :min_flag(0), max_flag(0), maybe_flag(0), maybe_null(f->real_maybe_null()), > elements(1), use_count(1), field(f), min_value((uchar*) min_value_arg), > max_value((uchar*) max_value_arg), next(0),prev(0), > - next_key_part(0),color(BLACK),type(KEY_RANGE) > + next_key_part(0), color(BLACK), type(KEY_RANGE) > { > left=right= &null_element; > + max_part_no= 1; > } > > SEL_ARG::SEL_ARG(Field *field_,uint8 part_, > @@ -1681,6 +2209,7 @@ > field(field_), min_value(min_value_), max_value(max_value_), > next(0),prev(0),next_key_part(0),color(BLACK),type(KEY_RANGE) > { > + max_part_no= part+1; > left=right= &null_element; > } > > @@ -1723,6 +2252,7 @@ > increment_use_count(1); > tmp->color= color; > tmp->elements= this->elements; > + tmp->max_part_no= max_part_no; > return tmp; > } > > @@ -2367,72 +2897,73 @@ > It is possible to use a range-based quick select (but it might be > slower than 'all' table scan). > */ > - if (tree->merges.is_empty()) > + TRP_RANGE *range_trp; > + TRP_ROR_INTERSECT *rori_trp; > + bool can_build_covering= FALSE; > + > + remove_nonrange_trees(&param, tree); > + > + /* Get best 'range' plan and prepare data for making other plans */ > + if ((range_trp= get_key_scans_params(&param, tree, FALSE, TRUE, > + best_read_time))) > { > - TRP_RANGE *range_trp; > - TRP_ROR_INTERSECT *rori_trp; > - bool can_build_covering= FALSE; > - > - /* Get best 'range' plan and prepare data for making other plans */ > - if ((range_trp= get_key_scans_params(&param, tree, FALSE, TRUE, > - best_read_time))) > - { > - best_trp= range_trp; > - best_read_time= best_trp->read_cost; > - } > + best_trp= range_trp; > + best_read_time= best_trp->read_cost; > + } > > + /* > + Simultaneous key scans and row deletes on several handler > + objects are not allowed so don't use ROR-intersection for > + table deletes. > + */ > + if ((thd->lex->sql_command != SQLCOM_DELETE) && > + optimizer_flag(thd, OPTIMIZER_SWITCH_INDEX_MERGE)) > + { > /* > - Simultaneous key scans and row deletes on several handler > - objects are not allowed so don't use ROR-intersection for > - table deletes. > + Get best non-covering ROR-intersection plan and prepare data for > + building covering ROR-intersection. > */ > - if ((thd->lex->sql_command != SQLCOM_DELETE) && > - optimizer_flag(thd, OPTIMIZER_SWITCH_INDEX_MERGE)) > + if ((rori_trp= get_best_ror_intersect(&param, tree, best_read_time, > + &can_build_covering))) > { > + best_trp= rori_trp; > + best_read_time= best_trp->read_cost; > /* > - Get best non-covering ROR-intersection plan and prepare data for > - building covering ROR-intersection. > + Try constructing covering ROR-intersect only if it looks possible > + and worth doing. > */ > - if ((rori_trp= get_best_ror_intersect(&param, tree, best_read_time, > - &can_build_covering))) > - { > + if (!rori_trp->is_covering && can_build_covering && > + (rori_trp= get_best_covering_ror_intersect(&param, tree, > + best_read_time))) > best_trp= rori_trp; > - best_read_time= best_trp->read_cost; > - /* > - Try constructing covering ROR-intersect only if it looks possible > - and worth doing. > - */ > - if (!rori_trp->is_covering && can_build_covering && > - (rori_trp= get_best_covering_ror_intersect(&param, tree, > - best_read_time))) > - best_trp= rori_trp; > - } > } > } > - else > + > + if (optimizer_flag(thd, OPTIMIZER_SWITCH_INDEX_MERGE)) > { > - if (optimizer_flag(thd, OPTIMIZER_SWITCH_INDEX_MERGE)) > + /* Try creating index_merge/ROR-union scan. */ > + SEL_IMERGE *imerge; > + TABLE_READ_PLAN *best_conj_trp= NULL, *new_conj_trp; > + LINT_INIT(new_conj_trp); /* no empty index_merge lists possible */ > + DBUG_PRINT("info",("No range reads possible," > + " trying to construct index_merge")); > + List_iterator_fast<SEL_IMERGE> it(tree->merges); > + while ((imerge= it++)) > { > - /* Try creating index_merge/ROR-union scan. */ > - SEL_IMERGE *imerge; > - TABLE_READ_PLAN *best_conj_trp= NULL, *new_conj_trp; > - LINT_INIT(new_conj_trp); /* no empty index_merge lists possible */ > - DBUG_PRINT("info",("No range reads possible," > - " trying to construct index_merge")); > - List_iterator_fast<SEL_IMERGE> it(tree->merges); > - while ((imerge= it++)) > + new_conj_trp= get_best_disjunct_quick(&param, imerge, best_read_time); > + if (new_conj_trp) > + set_if_smaller(param.table->quick_condition_rows, > + new_conj_trp->records); > + if (new_conj_trp && > + (!best_conj_trp || > + new_conj_trp->read_cost < best_conj_trp->read_cost)) > { > - new_conj_trp= get_best_disjunct_quick(&param, imerge, best_read_time); > - if (new_conj_trp) > - set_if_smaller(param.table->quick_condition_rows, > - new_conj_trp->records); > - if (!best_conj_trp || (new_conj_trp && new_conj_trp->read_cost < > - best_conj_trp->read_cost)) > - best_conj_trp= new_conj_trp; > + best_conj_trp= new_conj_trp; > + best_read_time= best_conj_trp->read_cost; > } > - if (best_conj_trp) > - best_trp= best_conj_trp; > } > + if (best_conj_trp) > + best_trp= best_conj_trp; > } > } > > @@ -3696,7 +4227,6 @@ > { > SEL_TREE **ptree; > TRP_INDEX_MERGE *imerge_trp= NULL; > - uint n_child_scans= imerge->trees_next - imerge->trees; > TRP_RANGE **range_scans; > TRP_RANGE **cur_child; > TRP_RANGE **cpk_scan= NULL; > @@ -3716,6 +4246,24 @@ > DBUG_ENTER("get_best_disjunct_quick"); > DBUG_PRINT("info", ("Full table scan cost: %g", read_time)); > > + /* > + In every tree of imerge remove SEL_ARG trees that do not make ranges. > + If after this removal some SEL_ARG tree becomes empty discard imerge. > + */ > + for (ptree= imerge->trees; ptree != imerge->trees_next; ptree++) > + { > + if (remove_nonrange_trees(param, *ptree)) > + { > + imerge->trees_next= imerge->trees; > + break; > + } > + } > + > + uint n_child_scans= imerge->trees_next - imerge->trees; > + > + if (!n_child_scans) > + DBUG_RETURN(NULL); > + > if (!(range_scans= (TRP_RANGE**)alloc_root(param->mem_root, > sizeof(TRP_RANGE*)* > n_child_scans))) > @@ -3835,6 +4383,13 @@ > imerge_trp->range_scans_end= range_scans + n_child_scans; > read_time= imerge_cost; > } > + if (imerge_trp) > + { > + TABLE_READ_PLAN *trp= merge_same_index_scans(param, imerge, imerge_trp, > + read_time); > + if (trp != imerge_trp) > + DBUG_RETURN(trp); > + } > } > > build_ror_index_merge: > @@ -3850,6 +4405,7 @@ > sizeof(TABLE_READ_PLAN*)* > n_child_scans))) > DBUG_RETURN(imerge_trp); > + > skip_to_ror_scan: > roru_index_costs= 0.0; > roru_total_records= 0; > @@ -3933,7 +4489,116 @@ > DBUG_RETURN(roru); > } > } > - DBUG_RETURN(imerge_trp); > + DBUG_RETURN(imerge_trp); > +} > + > + > +/* > + Merge index scans for the same indexes in an index merge plan > + > + SYNOPSIS > + merge_same_index_scans() > + param Context info for the operation > + imerge IN/OUT SEL_IMERGE from which imerge_trp has been extracted > + imerge_trp The index merge plan where index scans for the same > + indexes are to be merges > + read_time The upper bound for the cost of the plan to be evaluated > + > + DESRIPTION > + For the given index merge plan imerge_trp extracted from the SEL_MERGE > + imerge the function looks for range scans with the same indexes and merges > + them into SEL_ARG trees. Then for each such SEL_ARG tree r_i the function > + creates a range tree rt_i that contains only r_i. All rt_i are joined > + into one index merge that replaces the original index merge imerge. > + The function calls get_best_disjunct_quick for the new index merge to > + get a new index merge plan that contains index scans only for different > + indexes. > + If there are no index scans for the same index in the original index > + merge plan the function does not change the original imerge and returns > + imerge_trp as its result. > + > + RETURN > + The original or or improved index merge plan > +*/ > + > +static > +TABLE_READ_PLAN *merge_same_index_scans(PARAM *param, SEL_IMERGE *imerge, > + TRP_INDEX_MERGE *imerge_trp, > + double read_time) > +{ > + uint16 first_scan_tree_idx[MAX_KEY]; > + SEL_TREE **tree; > + TRP_RANGE **cur_child; > + uint removed_cnt= 0; > + > + DBUG_ENTER("merge_same_index_scans"); > + > + bzero(first_scan_tree_idx, sizeof(first_scan_tree_idx[0])*param->keys); > + > + for (tree= imerge->trees, cur_child= imerge_trp->range_scans; > + tree != imerge->trees_next; > + tree++, cur_child++) > + { > + DBUG_ASSERT(tree); > + uint key_idx= (*cur_child)->key_idx; > + uint16 *tree_idx_ptr= &first_scan_tree_idx[key_idx]; > + if (!*tree_idx_ptr) > + *tree_idx_ptr= (uint16) (tree-imerge->trees+1); > + else > + { > + SEL_TREE **changed_tree= imerge->trees+(*tree_idx_ptr-1); > + SEL_ARG *key= (*changed_tree)->keys[key_idx]; > + bzero((*changed_tree)->keys, > + sizeof((*changed_tree)->keys[0])*param->keys); > + (*changed_tree)->keys_map.clear_all(); > + if (((*changed_tree)->keys[key_idx]= > + key_or(param, key, (*tree)->keys[key_idx]))) > + (*changed_tree)->keys_map.set_bit(key_idx); > + *tree= NULL; > + removed_cnt++; > + } > + } > + if (!removed_cnt) > + DBUG_RETURN(imerge_trp); > + > + TABLE_READ_PLAN *trp= NULL; > + SEL_TREE **new_trees_next= imerge->trees; > + for (tree= new_trees_next; tree != imerge->trees_next; tree++) > + { > + if (!*tree) > + continue; > + if (tree > new_trees_next) > + *new_trees_next= *tree; > + new_trees_next++; > + } > + imerge->trees_next= new_trees_next; > + > + DBUG_ASSERT(imerge->trees_next>imerge->trees); > + > + if (imerge->trees_next-imerge->trees > 1) > + trp= get_best_disjunct_quick(param, imerge, read_time); > + else > + { > + /* > + This alternative theoretically can be reached when the cost > + of the index merge for such a formula as > + (key1 BETWEEN c1_1 AND c1_2) AND key2 > c2 OR > + (key1 BETWEEN c1_3 AND c1_4) AND key3 > c3 > + is estimated as being cheaper than the cost of index scan for > + the formula > + (key1 BETWEEN c1_1 AND c1_2) OR (key1 BETWEEN c1_3 AND c1_4) > + > + In the current code this may happen for two reasons: > + 1. for a single index range scan data records are accessed in > + a random order > + 2. the functions that estimate the cost of a range scan and an > + index merge retrievals are not well calibrated > + */ > + trp= get_key_scans_params(param, *imerge->trees, FALSE, TRUE, > + read_time); > + } > + > + DBUG_RETURN(trp); > } > > > @@ -4029,7 +4694,7 @@ > ror_scan->key_rec_length= (param->table->key_info[keynr].key_length + > param->table->file->ref_length); > ror_scan->sel_arg= sel_arg; > - ror_scan->records= param->table->quick_rows[keynr]; > + ror_scan->records= param->quick_rows[keynr]; > > if (!(bitmap_buf= (my_bitmap_map*) alloc_root(param->mem_root, > param->fields_bitmap_size))) > @@ -4049,7 +4714,7 @@ > bitmap_set_bit(&ror_scan->covered_fields, key_part->fieldnr-1); > } > ror_scan->index_read_cost= > - get_index_only_read_time(param, param->table->quick_rows[ror_scan->keynr], > + get_index_only_read_time(param, param->quick_rows[ror_scan->keynr], > ror_scan->keynr); > DBUG_RETURN(ror_scan); > } > @@ -4335,7 +5000,7 @@ > } > if (!prev_covered) > { > - double tmp= rows2double(info->param->table->quick_rows[scan->keynr]) / > + double tmp= rows2double(info->param->quick_rows[scan->keynr]) / > rows2double(prev_records); > DBUG_PRINT("info", ("Selectivity multiplier: %g", tmp)); > selectivity_mult *= tmp; > @@ -4414,7 +5079,7 @@ > } > else > { > - info->index_records += info->param->table->quick_rows[ror_scan->keynr]; > + info->index_records += info->param->quick_rows[ror_scan->keynr]; > info->index_scan_costs += ror_scan->index_read_cost; > bitmap_union(&info->covered_fields, &ror_scan->covered_fields); > if (!info->is_covering && bitmap_is_subset(&info->param->needed_fields, > @@ -6044,13 +6709,138 @@ > return root; > } > > -#define CLONE_KEY1_MAYBE 1 > -#define CLONE_KEY2_MAYBE 2 > -#define swap_clone_flag(A) ((A & 1) << 1) | ((A & 2) >> 1) > > +/* > + Build a range tree for the conjunction of the range parts of two trees > + > + SYNOPSIS > + and_range_trees() > + param Context info for the operation > + tree1 SEL_TREE for the first conjunct > + tree2 SEL_TREE for the second conjunct > + result SEL_TREE for the result > > -static SEL_TREE * > -tree_and(RANGE_OPT_PARAM *param,SEL_TREE *tree1,SEL_TREE *tree2) > + DESCRIPTION > + This function takes range parts of two trees tree1 and tree2 and builds > + a range tree for the conjunction of the formulas that these two range parts > + represent. > + More exactly: > + if the range part of tree1 represents the normalized formula > + R1_1 AND ... AND R1_k, > + and the range part of tree2 represents the normalized formula > + R2_1 AND ... AND R2_k, > + then the range part of the result represents the formula: > + RT = R_1 AND ... AND R_k, where R_i=(R1_i AND R2_i) for each i from [1..k] > + > + The function assumes that tree1 is never equal to tree2. At the same > + time the tree result can be the same as tree1 (but never as tree2). > + If result==tree1 then rt replaces the range part of tree1 leaving > + imerges as they are. > + if result!=tree1 than it is assumed that the SEL_ARG trees in tree1 and > + tree2 should be preserved. Otherwise they can be destroyed. > + > + RETURN > + 1 if the type the result tree is SEL_TREE::IMPOSSIBLE > + 0 otherwise > +*/ > + > +static > +int and_range_trees(RANGE_OPT_PARAM *param, SEL_TREE *tree1, SEL_TREE *tree2, > + SEL_TREE *result) > +{ > + DBUG_ENTER("and_ranges"); > + key_map result_keys; > + result_keys.clear_all(); > + key_map anded_keys= tree1->keys_map; > + anded_keys.merge(tree2->keys_map); > + int key_no; > + key_map::Iterator it(anded_keys); > + while ((key_no= it++) != key_map::Iterator::BITMAP_END) > + { > + uint flag=0; > + SEL_ARG *key1= tree1->keys[key_no]; > + SEL_ARG *key2= tree2->keys[key_no]; > + if (key1 && !key1->simple_key()) > + flag|= CLONE_KEY1_MAYBE; > + if (key2 && !key2->simple_key()) > + flag|=CLONE_KEY2_MAYBE; > + if (result != tree1) > + { > + if (key1) > + key1->incr_refs(); > + if (key2) > + key2->incr_refs(); > + } > + SEL_ARG *key; > + if ((result->keys[key_no]= key =key_and(param, key1, key2, flag))) > + { > + if (key && key->type == SEL_ARG::IMPOSSIBLE) > + { > + result->type= SEL_TREE::IMPOSSIBLE; > + DBUG_RETURN(1); > + } > + result_keys.set_bit(key_no); > +#ifdef EXTRA_DEBUG > + if (param->alloced_sel_args < SEL_ARG::MAX_SEL_ARGS) > + key->test_use_count(key); > +#endif > + } > + } > + result->keys_map= result_keys; > + DBUG_RETURN(0); > +} > + > + > +/* > + Build a SEL_TREE for a conjunction out of such trees for the conjuncts > + > + SYNOPSIS > + tree_and() > + param Context info for the operation > + tree1 SEL_TREE for the first conjunct > + tree2 SEL_TREE for the second conjunct > + > + DESCRIPTION > + This function builds a tree for the formula (A AND B) out of the trees > + tree1 and tree2 that has been built for the formulas A and B respectively. > + > + In a general case > + tree1 represents the formula RT1 AND MT1, > + where RT1 = R1_1 AND ... AND R1_k1, MT1=M1_1 AND ... AND M1_l1; > + tree2 represents the formula RT2 AND MT2 > + where RT2 = R2_1 AND ... AND R2_k2, MT2=M2_1 and ... and M2_l2. > + > + The result tree will represent the formula of the the following structure: > + RT AND MT1 AND MT2 AND RT1MT2 AND RT2MT1, such that > + rt is a tree obtained by range intersection of trees tree1 and tree2, > + RT1MT2 = RT1M2_1 AND ... AND RT1M2_l2, > + RT2MT1 = RT2M1_1 AND ... AND RT2M1_l1, > + where rt1m2_i (i=1,...,l2) is the result of the pushdown operation > + of range tree rt1 into imerge m2_i, while rt2m1_j (j=1,...,l1) is the > + result of the pushdown operation of range tree rt2 into imerge m1_j. > + > + RT1MT2/RT2MT is empty if MT2/MT1 is empty. > + > + The range intersection of two range trees is produced by the function > + and_range_trees. The pushdown of a range tree to a imerge is performed > + by the function imerge_list_and_tree. This function may produce imerges > + containing only one range tree. Such trees are intersected with rt and > + the result of intersection is returned as the range part of the result > + tree, while the corresponding imerges are removed altogether from its > + imerge part. > + > + NOTE. > + The pushdown operation of range trees into imerges is needed to be able > + to construct valid imerges for the condition like this: > + key1_p1=c1 AND (key1_p2 BETWEEN c21 AND c22 OR key2 < c2) > + > + RETURN > + The result tree, if a success > + 0 - otherwise. > +*/ > + > +static > +SEL_TREE *tree_and(RANGE_OPT_PARAM *param, SEL_TREE *tree1, SEL_TREE *tree2) > { > DBUG_ENTER("tree_and"); > if (!tree1) > @@ -6072,87 +6862,218 @@ > tree1->type=SEL_TREE::KEY_SMALLER; > DBUG_RETURN(tree1); > } > - key_map result_keys; > - result_keys.clear_all(); > - > - /* Join the trees key per key */ > - SEL_ARG **key1,**key2,**end; > - for (key1= tree1->keys,key2= tree2->keys,end=key1+param->keys ; > - key1 != end ; key1++,key2++) > + > + if (!tree1->merges.is_empty()) > + imerge_list_and_tree(param, &tree1->merges, tree2); > + if (!tree2->merges.is_empty()) > + imerge_list_and_tree(param, &tree2->merges, tree1); > + if (and_range_trees(param, tree1, tree2, tree1)) > + DBUG_RETURN(tree1); > + imerge_list_and_list(&tree1->merges, &tree2->merges); > + eliminate_single_tree_imerges(param, tree1); > + DBUG_RETURN(tree1); > +} > + > + > +/* > + Eliminate single tree imerges in a SEL_TREE objects > + > + SYNOPSIS > + eliminate_single_tree_imerges() > + param Context info for the function > + tree SEL_TREE where single tree imerges are to be eliminated > + > + DESCRIPTION > + For each imerge in 'tree' that contains only one disjunct tree, i.e. > + for any imerge of the form m=rt, the function performs and operation > + the range part of tree, replaces rt the with the result of anding and > + removes imerge m from the the merge part of 'tree'. > + > + RETURN VALUE > + none > +*/ > + > +static > +void eliminate_single_tree_imerges(RANGE_OPT_PARAM *param, SEL_TREE *tree) > +{ > + SEL_IMERGE *imerge; > + List<SEL_IMERGE> merges= tree->merges; > + List_iterator<SEL_IMERGE> it(merges); > + tree->merges.empty(); > + while ((imerge= it++)) > { > - uint flag=0; > - if (*key1 || *key2) > + if (imerge->trees+1 == imerge->trees_next) > { > - if (*key1 && !(*key1)->simple_key()) > - flag|=CLONE_KEY1_MAYBE; > - if (*key2 && !(*key2)->simple_key()) > - flag|=CLONE_KEY2_MAYBE; > - *key1=key_and(param, *key1, *key2, flag); > - if (*key1 && (*key1)->type == SEL_ARG::IMPOSSIBLE) > - { > - tree1->type= SEL_TREE::IMPOSSIBLE; > - DBUG_RETURN(tree1); > - } > - result_keys.set_bit(key1 - tree1->keys); > -#ifdef EXTRA_DEBUG > - if (*key1 && param->alloced_sel_args < SEL_ARG::MAX_SEL_ARGS) > - (*key1)->test_use_count(*key1); > -#endif > + tree= tree_and(param, tree, *imerge->trees); > + it.remove(); > } > } > - tree1->keys_map= result_keys; > - /* dispose index_merge if there is a "range" option */ > - if (!result_keys.is_clear_all()) > - { > - tree1->merges.empty(); > - DBUG_RETURN(tree1); > - } > + tree->merges= merges; > +} > > - /* ok, both trees are index_merge trees */ > - imerge_list_and_list(&tree1->merges, &tree2->merges); > - DBUG_RETURN(tree1); > + > +/* > + For two trees check that there are indexes with ranges in both of them > + > + SYNOPSIS > + sel_trees_have_common_keys() > + param Context info for the function This paramemeter doesn't seem to be used, please consider removing it. > + tree1 SEL_TREE for the first tree > + tree2 SEL_TREE for the second tree > + common_keys OUT bitmap of all indexes with ranges in both trees > + > + DESCRIPTION > + For two trees tree1 and tree1 the function checks if there are indexes > + in their range parts such that SEL_ARG trees are defined for them in the > + range parts of both trees. The function returns the bitmap of such > + indexes in the parameter common_keys. > + > + RETURN > + TRUE if there are such indexes (common_keys is nor empty) > + FALSE otherwise > +*/ > + > +static > +bool sel_trees_have_common_keys(RANGE_OPT_PARAM* param, > + SEL_TREE *tree1, SEL_TREE *tree2, > + key_map *common_keys) > +{ > + *common_keys= tree1->keys_map; > + common_keys->intersect(tree2->keys_map); > + return !common_keys->is_clear_all(); > } > > > /* > - Check if two SEL_TREES can be combined into one (i.e. a single key range > - read can be constructed for "cond_of_tree1 OR cond_of_tree2" ) without > - using index_merge. > + Check whether range parts of two trees can be ored for some indexes > + > + SYNOPSIS > + sel_trees_can_be_ored() > + param Context info for the function > + tree1 SEL_TREE for the first tree > + tree2 SEL_TREE for the second tree > + common_keys IN/OUT IN: bitmap of all indexes with SEL_ARG in both trees > + OUT: bitmap of all indexes that can be ored > + > + DESCRIPTION > + For two trees tree1 and tree2 and the bitmap common_keys containing > + bits for indexes that have SEL_ARG trees in range parts of both trees > + the function checks if there are indexes for which SEL_ARG trees can > + be ored. Two SEL_ARG trees for the same index can be ored if the most > + major components of the index used in these trees coincide. If the > + SEL_ARG trees for an index cannot be ored the function clears the bit > + for this index in the bitmap common_keys. > + > + The function does not verify that indexes marked in common_keys really > + have SEL_ARG trees in both tree1 and tree2. It assumes that this is true. > + > + NOTE > + The function sel_trees_can_be_ored is usually used in pair with the > + function sel_trees_have_common_keys. > + > + RETURN > + TRUE if there are indexes for which SEL_ARG trees can be ored > + FALSE otherwise > */ > > -bool sel_trees_can_be_ored(SEL_TREE *tree1, SEL_TREE *tree2, > - RANGE_OPT_PARAM* param) > +static > +bool sel_trees_can_be_ored(RANGE_OPT_PARAM* param, > + SEL_TREE *tree1, SEL_TREE *tree2, > + key_map *common_keys) > { > - key_map common_keys= tree1->keys_map; > DBUG_ENTER("sel_trees_can_be_ored"); > - common_keys.intersect(tree2->keys_map); > + if (!sel_trees_have_common_keys(param, tree1, tree2, common_keys)) > + DBUG_RETURN(FALSE); > + int key_no; > + key_map::Iterator it(*common_keys); > + while ((key_no= it++) != key_map::Iterator::BITMAP_END) > + { > + DBUG_ASSERT(tree1->keys[key_no] && tree2->keys[key_no]); > + /* Trees have a common key, check if they refer to the same key part */ > + if (tree1->keys[key_no]->part != tree2->keys[key_no]->part) > + common_keys->clear_bit(key_no); > + } > + DBUG_RETURN(!common_keys->is_clear_all()); > +} > + > +/* > + Check whether range parts of two trees must be ored for some indexes > + > + SYNOPSIS > + sel_trees_must_be_ored() > + param Context info for the function > + tree1 SEL_TREE for the first tree > + tree2 SEL_TREE for the second tree > + ordable_keys bitmap of SEL_ARG trees that can be ored > > - if (common_keys.is_clear_all()) > + DESCRIPTION > + For two trees tree1 and tree2 the function checks whether they must be > + ored. The function assumes that the bitmap ordable_keys contains bits for > + those corresponding pairs of SEL_ARG trees from tree1 and tree2 that can > + be ored. > + We believe that tree1 and tree2 must be ored if any pair of SEL_ARG trees > + r1 and r2, such that r1 is from tree1 and r2 is from tree2 and both > + of them are marked in ordable_keys, can be merged. > + > + NOTE > + The function sel_trees_must_be_ored as a rule is used in pair with the > + function sel_trees_can_be_ored. > + > + RETURN > + TRUE if there are indexes for which SEL_ARG trees must be ored > + FALSE otherwise > +*/ > + > +static > +bool sel_trees_must_be_ored(RANGE_OPT_PARAM* param, > + SEL_TREE *tree1, SEL_TREE *tree2, > + key_map oredable_keys) > +{ > + key_map tmp; > + DBUG_ENTER("sel_trees_must_be_ored"); > + > + tmp= tree1->keys_map; > + tmp.merge(tree2->keys_map); > + tmp.subtract(oredable_keys); > + if (!tmp.is_clear_all()) > DBUG_RETURN(FALSE); > > - /* trees have a common key, check if they refer to same key part */ > - SEL_ARG **key1,**key2; > - for (uint key_no=0; key_no < param->keys; key_no++) > - { > - if (common_keys.is_set(key_no)) > + int idx1, idx2; > + key_map::Iterator it1(oredable_keys); > + while ((idx1= it1++) != key_map::Iterator::BITMAP_END) > + { > + KEY_PART *key1_init= param->key[idx1]+tree1->keys[idx1]->part; > + KEY_PART *key1_end= param->key[idx1]+tree1->keys[idx1]->max_part_no; > + key_map::Iterator it2(oredable_keys); > + while ((idx2= it2++) != key_map::Iterator::BITMAP_END) > { > - key1= tree1->keys + key_no; > - key2= tree2->keys + key_no; > - if ((*key1)->part == (*key2)->part) > - { > - DBUG_RETURN(TRUE); > + if (idx2 <= idx1) > + continue; > + > + KEY_PART *key2_init= param->key[idx2]+tree2->keys[idx2]->part; > + KEY_PART *key2_end= param->key[idx2]+tree2->keys[idx2]->max_part_no; > + KEY_PART *part1, *part2; > + for (part1= key1_init, part2= key2_init; > + part1 < key1_end && part2 < key2_end; > + part1++, part2++) > + { > + if (!part1->field->eq(part2->field)) > + DBUG_RETURN(FALSE); > } > } > } > - DBUG_RETURN(FALSE); > -} > + > + DBUG_RETURN(TRUE); > +} > > > /* > - Remove the trees that are not suitable for record retrieval. > + Remove the trees that are not suitable for record retrieval > + > SYNOPSIS > - param Range analysis parameter > - tree Tree to be processed, tree->type is KEY or KEY_SMALLER > + remove_nonrange_trees() > + param Context info for the function > + tree Tree to be processed, tree->type is KEY or KEY_SMALLER > > DESCRIPTION > This function walks through tree->keys[] and removes the SEL_ARG* trees > @@ -6163,41 +7084,36 @@ > > A SEL_ARG* tree cannot be used to construct quick select if it has > tree->part != 0. (e.g. it could represent "keypart2 < const"). > - > - WHY THIS FUNCTION IS NEEDED > > Normally we allow construction of SEL_TREE objects that have SEL_ARG > - trees that do not allow quick range select construction. For example for > - " keypart1=1 AND keypart2=2 " the execution will proceed as follows: > + trees that do not allow quick range select construction. > + For example: > + for " keypart1=1 AND keypart2=2 " the execution will proceed as follows: > tree1= SEL_TREE { SEL_ARG{keypart1=1} } > tree2= SEL_TREE { SEL_ARG{keypart2=2} } -- can't make quick range select > from this > call tree_and(tree1, tree2) -- this joins SEL_ARGs into a usable SEL_ARG > tree. > - > - There is an exception though: when we construct index_merge SEL_TREE, > - any SEL_ARG* tree that cannot be used to construct quick range select can > - be removed, because current range analysis code doesn't provide any way > - that tree could be later combined with another tree. > - Consider an example: we should not construct > - st1 = SEL_TREE { > - merges = SEL_IMERGE { > - SEL_TREE(t.key1part1 = 1), > - SEL_TREE(t.key2part2 = 2) -- (*) > - } > - }; > - because > - - (*) cannot be used to construct quick range select, > - - There is no execution path that would cause (*) to be converted to > - a tree that could be used. > - > - The latter is easy to verify: first, notice that the only way to convert > - (*) into a usable tree is to call tree_and(something, (*)). > - > - Second look at what tree_and/tree_or function would do when passed a > - SEL_TREE that has the structure like st1 tree has, and conlcude that > - tree_and(something, (*)) will not be called. > > + Another example: > + tree3= SEL_TREE { SEL_ARG{key1part1 = 1} } > + tree4= SEL_TREE { SEL_ARG{key2part2 = 2} } -- can't make quick range select > + from this > + call tree_or(tree3, tree4) -- creates a SEL_MERGE ot of which no index > + merge can be constructed, but it is potentially useful, as anding it with > + tree5= SEL_TREE { SEL_ARG{key2part1 = 3} } creates an index merge that > + represents the formula > + key1part1=1 AND key2part1=3 OR key2part1=3 AND key2part2=2 > + for which an index merge can be built. > + > + Any final SEL_TREE may contains SEL_ARG trees for which no quick select s/contains/contain/? > + can be built. Such SEL_ARG trees should be removed from the range part > + before different range scans are evaluated. Such SEL_ARG trees also should > + be removed from all range trees of each index merge before different > + possible index merge plans are evaluated. If after this removal one > + of the range trees in the index merge becomes empty the whole index merge > + must be discarded. > + > RETURN > 0 Ok, some suitable trees left > 1 No tree->keys[] left. > @@ -6223,6 +7139,74 @@ > } > > > +/* > + Build a SEL_TREE for a disjunction out of such trees for the disjuncts > + > + SYNOPSIS > + tree_or() > + param Context info for the operation > + tree1 SEL_TREE for the first disjunct > + tree2 SEL_TREE for the second disjunct > + > + DESCRIPTION > + This function builds a tree for the formula (A OR B) out of the trees > + tree1 and tree2 that has been built for the formulas A and B respectively. > + > + In a general case > + tree1 represents the formula RT1 AND MT1, > + where RT1=R1_1 AND ... AND R1_k1, MT1=M1_1 AND ... AND M1_l1; > + tree2 represents the formula RT2 AND MT2 > + where RT2=R2_1 AND ... AND R2_k2, MT2=M2_1 and ... and M2_l2. > + > + The function constructs the result tree according the formula > + (RT1 OR RT2) AND (MT1 OR RT1) AND (MT2 OR RT2) AND (MT1 OR MT2) > + that is equivalent to the formula (RT1 AND MT1) OR (RT2 AND MT2). > + > + To limit the number of produced imerges the function considers > + a weaker formula than the original one: > + (RT1 AND M1_1) OR (RT2 AND M2_1) > + that is equivalent to: > + (RT1 OR RT2) (1) > + AND > + (M1_1 OR M2_1) (2) > + AND > + (M1_1 OR RT2) (3) > + AND > + (M2_1 OR RT1) (4) > + > + For the first conjunct (1) the function builds a tree with a range part > + and, possibly, one imerge. For the other conjuncts (2-4)the function > + produces sets of imerges. All constructed imerges are included into the > + result tree. > + > + For the formula (1) the function produces the tree representing a formula > + of the structure RT [AND M], such that: > + - the range tree rt contains the result of oring SEL_ARG trees from rt1 > + and rt2 > + - the imerge m consists of two range trees rt1 and rt2. > + The imerge m is added if it's not true that rt1 and rt2 must be ored > + If rt1 and rt2 can't be ored rt is empty and only m is produced for (1). > + > + To produce imerges for the formula (2) the function calls the function > + imerge_list_or_list passing it the merge parts of tree1 and tree2 as > + parameters. > + > + To produce imerges for the formula (3) the function calls the function > + imerge_list_or_tree passing it the imerge m1_1 and the range tree rt2 as > + parameters. Similarly, to produce imerges for the formula (4) the function > + calls the function imerge_list_or_tree passing it the imerge m2_1 and the > + range tree rt1. > + > + If rt1 is empty then the trees for (1) and (4) are empty. > + If rt2 is empty then the trees for (1) and (3) are empty. > + If mt1 is empty then the trees for (2) and (3) are empty. > + If mt2 is empty then the trees for (2) and (4) are empty. > + > + RETURN > + The result tree for the operation if a success > + 0 - otherwise > +*/ > + > static SEL_TREE * > tree_or(RANGE_OPT_PARAM *param,SEL_TREE *tree1,SEL_TREE *tree2) > { > @@ -6238,74 +7222,100 @@ > if (tree2->type == SEL_TREE::MAYBE) > DBUG_RETURN(tree2); > > - SEL_TREE *result= 0; > - key_map result_keys; > - result_keys.clear_all(); > - if (sel_trees_can_be_ored(tree1, tree2, param)) > + SEL_TREE *result= NULL; > + key_map result_keys; > + key_map ored_keys; > + SEL_TREE *rtree[2]= {NULL,NULL}; > + SEL_IMERGE *imerge[2]= {NULL, NULL}; > + bool no_ranges1= tree1->without_ranges(); This variable is only used negated, "!no_ranges". Double negation makes the code very hard to read. Could you please change to positive notation? (the same goes for without_ranges() and without_imerges())? > + bool no_ranges2= tree2->without_ranges(); > + bool no_merges1= tree1->without_imerges(); > + bool no_merges2= tree2->without_imerges(); > + if (!no_ranges1 && !no_merges2) > + { > + rtree[0]= new SEL_TREE(tree1, TRUE, param); > + imerge[1]= new SEL_IMERGE(tree2->merges.head(), 0, param); > + } > + if (!no_ranges2 && !no_merges1) > { > - /* Join the trees key per key */ > - SEL_ARG **key1,**key2,**end; > - for (key1= tree1->keys,key2= tree2->keys,end= key1+param->keys ; > - key1 != end ; key1++,key2++) > - { > - *key1=key_or(param, *key1, *key2); > - if (*key1) > + rtree[1]= new SEL_TREE(tree2, TRUE, param); > + imerge[0]= new SEL_IMERGE(tree1->merges.head(), 0, param); > + } > + bool no_imerge_from_ranges= FALSE; > + if (!(result= new SEL_TREE())) > + DBUG_RETURN(result); > + > + /* Build the range part of the tree for the formula (1) */ > + if (sel_trees_can_be_ored(param, tree1, tree2, &ored_keys)) > + { > + bool must_be_ored= sel_trees_must_be_ored(param, tree1, tree2, ored_keys); > + no_imerge_from_ranges= must_be_ored; > + key_map::Iterator it(ored_keys); > + int key_no; > + while ((key_no= it++) != key_map::Iterator::BITMAP_END) > + { > + SEL_ARG *key1= tree1->keys[key_no]; > + SEL_ARG *key2= tree2->keys[key_no]; > + if (!must_be_ored) > { > - result=tree1; // Added to tree1 > - result_keys.set_bit(key1 - tree1->keys); > -#ifdef EXTRA_DEBUG > - if (param->alloced_sel_args < SEL_ARG::MAX_SEL_ARGS) > - (*key1)->test_use_count(*key1); > -#endif > + key1->incr_refs(); > + key2->incr_refs(); > } > + if ((result->keys[key_no]= key_or(param, key1, key2))) > + result->keys_map.set_bit(key_no); > } > - if (result) > - result->keys_map= result_keys; > + result->type= tree1->type; > } > - else > + > + if (no_imerge_from_ranges && no_merges1 && no_merges2) > { > - /* ok, two trees have KEY type but cannot be used without index merge */ > - if (tree1->merges.is_empty() && tree2->merges.is_empty()) > - { > - if (param->remove_jump_scans) > - { > - bool no_trees= remove_nonrange_trees(param, tree1); > - no_trees= no_trees || remove_nonrange_trees(param, tree2); > - if (no_trees) > - DBUG_RETURN(new SEL_TREE(SEL_TREE::ALWAYS)); > - } > - SEL_IMERGE *merge; > - /* both trees are "range" trees, produce new index merge structure */ > - if (!(result= new SEL_TREE()) || !(merge= new SEL_IMERGE()) || > - (result->merges.push_back(merge)) || > - (merge->or_sel_tree(param, tree1)) || > - (merge->or_sel_tree(param, tree2))) > + if (result->keys_map.is_clear_all()) > + result->type= SEL_TREE::ALWAYS; > + DBUG_RETURN(result); > + } > + > + SEL_IMERGE *imerge_from_ranges; > + if (!(imerge_from_ranges= new SEL_IMERGE())) > + result= NULL; > + else if (!no_ranges1 && !no_ranges2 && !no_imerge_from_ranges) > + { > + /* Build the imerge part of the tree for the formula (1) */ > + SEL_TREE *rt1= tree1; > + SEL_TREE *rt2= tree2; > + if (!no_merges1) > + rt1= new SEL_TREE(tree1, TRUE, param); > + if (!no_merges2) > + rt2= new SEL_TREE(tree2, TRUE, param); > + if (!rt1 || !rt2 || > + result->merges.push_back(imerge_from_ranges) || > + imerge_from_ranges->or_sel_tree(param, rt1) || > + imerge_from_ranges->or_sel_tree(param, rt2)) > result= NULL; ^^^^^^ Wrong identation > - else > - result->type= tree1->type; > - } > - else if (!tree1->merges.is_empty() && !tree2->merges.is_empty()) > - { > - if (imerge_list_or_list(param, &tree1->merges, &tree2->merges)) > - result= new SEL_TREE(SEL_TREE::ALWAYS); > - else > - result= tree1; > - } > - else > - { > - /* one tree is index merge tree and another is range tree */ > - if (tree1->merges.is_empty()) > - swap_variables(SEL_TREE*, tree1, tree2); > - > - if (param->remove_jump_scans && remove_nonrange_trees(param, tree2)) > - DBUG_RETURN(new SEL_TREE(SEL_TREE::ALWAYS)); > - /* add tree2 to tree1->merges, checking if it collapses to ALWAYS */ > - if (imerge_list_or_tree(param, &tree1->merges, tree2)) > - result= new SEL_TREE(SEL_TREE::ALWAYS); > - else > - result= tree1; > - } > } > + if (!result) > + DBUG_RETURN(result); > + > + result->type= tree1->type; > + > + if (!no_merges1 && !no_merges2 && > + !imerge_list_or_list(param, &tree1->merges, &tree2->merges)) > + { > + /* Build the imerges for the formula (2) */ > + imerge_list_and_list(&result->merges, &tree1->merges); > + } > + > + /* Build the imerges for the formulas (3) and (4) */ > + for (uint i=0; i < 2; i++) > + { > + List<SEL_IMERGE> merges; > + SEL_TREE *rt= rtree[i]; > + SEL_IMERGE *im= imerge[1-i]; > + > + if (rt && im && !merges.push_back(im) && > + !imerge_list_or_tree(param, &merges, rt)) > + imerge_list_and_list(&result->merges, &merges); > + } > + > DBUG_RETURN(result); > } > > @@ -6351,6 +7361,7 @@ > if (!key1) > return &null_element; // Impossible ranges > key1->use_count++; > + key1->max_part_no= max(key2->max_part_no, key2->part+1); > return key1; > } > > @@ -6450,6 +7461,7 @@ > key1->use_count--; > key2->use_count--; > SEL_ARG *e1=key1->first(), *e2=key2->first(), *new_tree=0; > + uint max_part_no= max(key1->max_part_no, key2->max_part_no); > > while (e1 && e2) > { > @@ -6487,6 +7499,7 @@ > key2->free_tree(); > if (!new_tree) > return &null_element; // Impossible range > + new_tree->max_part_no= max_part_no; > return new_tree; > } > > @@ -6558,7 +7571,7 @@ > { > swap_variables(SEL_ARG *,key1,key2); > } > - if (key1->use_count > 0 || !(key1=key1->clone_tree(param))) > + if (key1->use_count > 0 && !(key1=key1->clone_tree(param))) > return 0; // OOM > } > > @@ -6566,6 +7579,8 @@ > bool key2_shared=key2->use_count != 0; > key1->maybe_flag|=key2->maybe_flag; > > + uint max_part_no= max(key1->max_part_no, key2->max_part_no); > + > for (key2=key2->first(); key2; ) > { > SEL_ARG *tmp=key1->find_range(key2); // Find key1.min <= key2.min > @@ -6758,6 +7773,7 @@ > key2=next; > } > key1->use_count++; > + key1->max_part_no= max_part_no; Is it intentional that here in key_or() max_part_no calculation is so imprecise? One can construct many cases where the result of key_or()'s will have result->max_part_no < max(key1->max_part_no, key2->max_part_no); e.g. consider key_or(keypart1=c1, (keypart1=c1 AND keypart2=c2)) I think it's fairly easy to have this function to save the real max_part_no? (if we don't want to do that, we need a comment near max_part_no's definition about that it is an upper bound, and a couple of sentences in this function that would explain why we chose not to calculate real max_part_no). > return key1; > } > > @@ -7237,11 +8253,7 @@ > void SEL_ARG::test_use_count(SEL_ARG *root) > { > uint e_count=0; > - if (this == root && use_count != 1) > - { > - sql_print_information("Use_count: Wrong count %lu for root",use_count); > - return; > - } > + > if (this->type != SEL_ARG::KEY_RANGE) > return; > for (SEL_ARG *pos=first(); pos ; pos=pos->next) > @@ -7344,12 +8356,13 @@ > param->table->quick_n_ranges[key]= param->n_ranges; > param->table->quick_condition_rows= > min(param->table->quick_condition_rows, records); > + param->table->quick_rows[key]= records; > } > /* > Need to save quick_rows in any case as it is used when calculating > cost of ROR intersection: > */ > - param->table->quick_rows[key]=records; > + param->quick_rows[key]= records; > if (cpk_scan) > param->is_ror_scan= TRUE; > } > diff -urN --exclude='.*' maria-5.1-wl24-clean2/sql/sql_list.h maria-5.1-wl24-nocompile/sql/sql_list.h > --- maria-5.1-wl24-clean2/sql/sql_list.h 2009-11-03 00:18:22.000000000 +0300 > +++ maria-5.1-wl24-nocompile/sql/sql_list.h 2009-11-03 00:11:54.000000000 +0300 > @@ -168,6 +168,14 @@ > { > if (!list->is_empty()) > { > +#if 0 > +#else > + if (is_empty()) > + { > + *this= *list; > + return; > + } > +#endif Please remove #if/#else. > *last= list->first; > last= list->last; > elements+= list->elements; BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog

1 0

[Maria-developers] [Merge] lp:~atcurtis/maria/maria-5.1-const.part1 into lp:maria
by Michael Widenius 04 Nov '09

04 Nov '09

Hi! >>>>> "Antony" == Antony T Curtis <Antony> writes: Antony> Antony T Curtis has proposed merging lp:~atcurtis/maria/maria-5.1-const.part1 into lp:maria. Antony> Requested reviews: Antony> Maria-captains (maria-captains) Antony> declare all constant structures in strings/* as 'const'. Antony> This is less ambitious than previous request. Thanks for doing this in small steps! Antony> === modified file 'strings/ctype-uca.c' <cut> Antony> @@ -7838,7 +7837,7 @@ Antony> uchar *newlengths; Antony> uint16 **newweights; Antony> const uchar *deflengths= uca_length; Antony> - uint16 **defweights= uca_weight; Antony> + const uint16 *const *defweights= uca_weight; Antony> int rc, i; Antony> int ncontractions= 0; Antony> @@ -7928,11 +7927,11 @@ Antony> for (i= 0; i < 256 ; i++) Antony> { Antony> if (!newweights[i]) Antony> - newweights[i]= defweights[i]; Antony> + ((const uint16**) newweights)[i]= defweights[i]; Antony> } Antony> cs->sort_order= newlengths; Antony> - cs->sort_order_big= newweights; Antony> + cs->sort_order_big= (const uint16**) newweights; Antony> cs->contractions= NULL; Can you explain why the last cast is needed ? Shouldn't you be able to assign a non const variable to a const member without a cast? Otherwise the patch looks good to me and it would be ok to include this in MariaDB 5.2. I assume that the 5.1-const tree doesn't include anything else than this patch ? Regards, Monty

2 1

[Maria-developers] Updated (by Alexi): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply＠askmonty.org 04 Nov '09

04 Nov '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter updates to certain tables CREATION DATE..: Mon, 10 Aug 2009, 13:25 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 40 (http://askmonty.org/worklog/?tid=40) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 2 ESTIMATE.......: 48 (hours remain) ORIG. ESTIMATE.: 48 PROGRESS NOTES: -=-=(Alexi - Wed, 04 Nov 2009, 10:21)=-=- Low Level Design modified. --- /tmp/wklog.40.old.6734 2009-11-04 10:21:20.000000000 +0200 +++ /tmp/wklog.40.new.6734 2009-11-04 10:21:20.000000000 +0200 @@ -21,9 +21,9 @@ slave_proxy_id 4 exec_time 4 db_len 1 -+ query_len 2 (see Note 1) error_code 2 status_vars_len 2 ++ query_len 2 (see Note 1) + tables_info_len 2 (see Note 2) --------------------------------- BODY: -=-=(Alexi - Tue, 03 Nov 2009, 11:19)=-=- Low Level Design modified. --- /tmp/wklog.40.old.7187 2009-11-03 11:19:22.000000000 +0200 +++ /tmp/wklog.40.new.7187 2009-11-03 11:19:22.000000000 +0200 @@ -1 +1,132 @@ +OPTION: 2.5 Extend Query Events With Tables Info +================================================ +1. Query_log_event Binary Format +******************************** +Changes to be done: + + Query_log_event binary format + --------------------------------- + Name Size (bytes) + --------------------------------- + COMMON HEADER: + timestamp 4 + type 1 + server_id 4 + total_size 4 + master_position 4 + flags 2 + --------------------------------- + POST HEADER: + slave_proxy_id 4 + exec_time 4 + db_len 1 ++ query_len 2 (see Note 1) + error_code 2 + status_vars_len 2 ++ tables_info_len 2 (see Note 2) + --------------------------------- + BODY: + status_vars status_vars_len +- db db_len + 1 ++ db db_len (see Note 3) + query query_len ++ tables_info + + tables_info binary format + --------------------------------- + Name Size (bytes) + --------------------------------- + db_len 1 (see Note 4) + db db_len + table_name_len 1 + table_name table_name_len + ... + db_len 1 + db db_len + table_name_len 1 + table_name table_name_len + +NOTES +1. Currently Query_log_event format doesn't include 'query_len' because + it considers the query to extent to the end of the event. +2. If tables_info is not included in the event (--binlog-with-tables-info + option), tables_info_len = 0. +3. The trailing zero is redundant since the length is already known. +4. In case of db = current db, db_len = 0 and db = empty, because + current db is already included in the current event format. + +2. Where to get tables info from? +********************************* + +2.1. Case study: CREATE TABLE +****************************** + +*** CREATE TABLE table [SELECT ...] + + bool mysql_create_table_no_lock( + THD *thd, + const char *db, + const char *table_name, ...) + { + ... + // ------------------------------------- + // WL40: To be included in tables_info: + // * db, table_name + // * thd->lex->query_tables (tables refered to in + // the select-part; empty if no select-part) + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + +*** CREATE TABLE table LIKE src-table + + bool mysql_create_like_table( + ... + TABLE_LIST *table, + TABLE_LIST *src_table, + ...) + { + ... + if (thd->current_stmt_binlog_row_based) + { // RBR: In this case we don't replicate temp tables + if (!(create_info->options & HA_LEX_CREATE_TMP_TABLE)) + { + if (src_table->table->s->tmp_table) + { // CREATE normal-table LIKE temp-table: + + // Generate new query without LIKE-part + store_create_info(thd, table, &query, create_info, FALSE); + + // ------------------------------------- + // WL40: To include to tables_info: + // * table (src_table is not included) + // ------------------------------------- + write_bin_log(thd, TRUE, query.ptr(), query.length()); + } + else + { // CREATE normal-table LIKE normal-table + + // ------------------------------------- + // WL40: To include to log_tables_info: + // * table + // * src_table + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + } + // CREATE temp-table LIKE ... + // This case is not replicated + } + else + { // SBR: + // ------------------------------------- + // WL40: To include to tables_info: + // * table + // * src_table + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + } + +To be continued -=-=(Alexi - Mon, 02 Nov 2009, 11:34)=-=- Worked 2 hours on option 2.5 Worked 2 hours and estimate 48 hours remain (original estimate increased by 50 hours). -=-=(Alexi - Mon, 02 Nov 2009, 11:20)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.4848 2009-11-02 11:20:24.000000000 +0200 +++ /tmp/wklog.40.new.4848 2009-11-02 11:20:24.000000000 +0200 @@ -90,3 +90,25 @@ It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. +2.5 Extend Query Events With Tables Info +---------------------------------------- + +We could extend query events structure with a tables info - a list of tables +which the query refers to: + + <current query event structure> + tables_info_len + dbase_len dbase + table_len table + ... + dbase_len dbase + table_len table + +Note. In case of <dbase> = current data base, we can set dbase_len = 0 + and dbase = empty because current query event structure already + includes current data base name. + +Note. Possibly it is reasonable also to add a --binlog-with-tables-info + option which defines whether tables info must be included to the + query events. + -=-=(Knielsen - Fri, 14 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.10896 2009-08-14 15:47:39.000000000 +0300 +++ /tmp/wklog.40.new.10896 2009-08-14 15:47:39.000000000 +0300 @@ -72,3 +72,21 @@ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. + +2.4 Implement server functionality to ignore certain tables +----------------------------------------------------------- + +We could add a general facility in the server to ignore certain tables: + + SET SESSION ignored_tables = "db1.t1,db2.t2"; + +This would work similar to --replicate-ignore-table, but in a general way not +restricted to the slave SQL thread. + +It would then be trivial for mysqlbinlog to add such statements at the start +of the output, or probably the user could just do it manually with no need for +additional options for mysqlbinlog. + +It might be useful to integrate this with the code that already handles +--replicate-ignore-db and similar slave options. + -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.12989 2009-08-10 15:41:23.000000000 +0300 +++ /tmp/wklog.40.new.12989 2009-08-10 15:41:23.000000000 +0300 @@ -1,6 +1,7 @@ - 1. Context ---------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl -=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=- Dependency created: 39 now depends on 40 -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High Level Description modified. --- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300 +++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300 @@ -1,3 +1,4 @@ Replication slave can be set to filter updates to certain tables with ---replicate-[wild-]{do,ignore}-table options. This task is about adding similar -functionality to mysqlbinlog. +--replicate-[wild-]{do,ignore}-table options. + +This task is about adding similar functionality to mysqlbinlog. -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300 +++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300 @@ -1 +1,73 @@ +1. Context +---------- +At the moment, the server has these replication slave options: + + --replicate-do-table=db.tbl + --replicate-ignore-table=db.tbl + --replicate-wild-do-table=pattern.pattern + --replicate-wild-ignore-table=pattern.pattern + +They affect both RBR and SBR events. SBR events are checked after the +statement has been parsed, the server iterates over list of used tables and +checks them againist --replicate instructions. + +What is interesting is that this scheme still allows to update the ignored +table through a VIEW. + +2. Table filtering in mysqlbinlog +--------------------------------- + +Per-table filtering of RBR events is easy (as it is relatively easy to extract +the name of the table that the event applies to). + +Per-table filtering of SBR events is hard, as generally it is not apparent +which tables the statement refers to. + +This opens possible options: + +2.1 Put the parser into mysqlbinlog +----------------------------------- +Once we have a full parser in mysqlbinlog, we'll be able to check which tables +are used by a statement, and will allow to show behaviour identical to those +that one obtains when using --replicate-* slave options. + +(It is not clear how much effort is needed to put the parser into mysqlbinlog. +Any guesses?) + + +2.2 Use dumb regexp match +------------------------- +Use a really dumb approach. A query is considered to be modifying table X if +it matches an expression + +CREATE TABLE $tablename +DROP $tablename +UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' +DELETE ...$tablename ... WHERE // same as above +ALTER TABLE $tablename +.. etc (go get from the grammar) .. + +The advantage over doing the same in awk is that mysqlbinlog will also process +RBR statements, and together with that will provide a working solution for +those who are careful with their table names not mixing with string constants +and such. + +(TODO: string constants are of particular concern as they come from +[potentially hostile] users, unlike e.g. table aliases which come from +[not hostile] developers. Remove also all string constants before attempting +to do match?) + +2.3 Have the master put annotations +----------------------------------- +We could add a master option so that it injects into query a mark that tells +which tables the query will affect, e.g. for the query + + UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... + + +the binlog will have + + /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... + +and further processing in mysqlbinlog will be trivial. DESCRIPTION: Replication slave can be set to filter updates to certain tables with --replicate-[wild-]{do,ignore}-table options. This task is about adding similar functionality to mysqlbinlog. HIGH-LEVEL SPECIFICATION: 1. Context ---------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl --replicate-ignore-table=db.tbl --replicate-wild-do-table=pattern.pattern --replicate-wild-ignore-table=pattern.pattern They affect both RBR and SBR events. SBR events are checked after the statement has been parsed, the server iterates over list of used tables and checks them againist --replicate instructions. What is interesting is that this scheme still allows to update the ignored table through a VIEW. 2. Table filtering in mysqlbinlog --------------------------------- Per-table filtering of RBR events is easy (as it is relatively easy to extract the name of the table that the event applies to). Per-table filtering of SBR events is hard, as generally it is not apparent which tables the statement refers to. This opens possible options: 2.1 Put the parser into mysqlbinlog ----------------------------------- Once we have a full parser in mysqlbinlog, we'll be able to check which tables are used by a statement, and will allow to show behaviour identical to those that one obtains when using --replicate-* slave options. (It is not clear how much effort is needed to put the parser into mysqlbinlog. Any guesses?) 2.2 Use dumb regexp match ------------------------- Use a really dumb approach. A query is considered to be modifying table X if it matches an expression CREATE TABLE $tablename DROP $tablename UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' DELETE ...$tablename ... WHERE // same as above ALTER TABLE $tablename .. etc (go get from the grammar) .. The advantage over doing the same in awk is that mysqlbinlog will also process RBR statements, and together with that will provide a working solution for those who are careful with their table names not mixing with string constants and such. (TODO: string constants are of particular concern as they come from [potentially hostile] users, unlike e.g. table aliases which come from [not hostile] developers. Remove also all string constants before attempting to do match?) 2.3 Have the master put annotations ----------------------------------- We could add a master option so that it injects into query a mark that tells which tables the query will affect, e.g. for the query UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... the binlog will have /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. 2.4 Implement server functionality to ignore certain tables ----------------------------------------------------------- We could add a general facility in the server to ignore certain tables: SET SESSION ignored_tables = "db1.t1,db2.t2"; This would work similar to --replicate-ignore-table, but in a general way not restricted to the slave SQL thread. It would then be trivial for mysqlbinlog to add such statements at the start of the output, or probably the user could just do it manually with no need for additional options for mysqlbinlog. It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. 2.5 Extend Query Events With Tables Info ---------------------------------------- We could extend query events structure with a tables info - a list of tables which the query refers to: <current query event structure> tables_info_len dbase_len dbase table_len table ... dbase_len dbase table_len table Note. In case of <dbase> = current data base, we can set dbase_len = 0 and dbase = empty because current query event structure already includes current data base name. Note. Possibly it is reasonable also to add a --binlog-with-tables-info option which defines whether tables info must be included to the query events. LOW-LEVEL DESIGN: OPTION: 2.5 Extend Query Events With Tables Info ================================================ 1. Query_log_event Binary Format ******************************** Changes to be done: Query_log_event binary format --------------------------------- Name Size (bytes) --------------------------------- COMMON HEADER: timestamp 4 type 1 server_id 4 total_size 4 master_position 4 flags 2 --------------------------------- POST HEADER: slave_proxy_id 4 exec_time 4 db_len 1 error_code 2 status_vars_len 2 + query_len 2 (see Note 1) + tables_info_len 2 (see Note 2) --------------------------------- BODY: status_vars status_vars_len - db db_len + 1 + db db_len (see Note 3) query query_len + tables_info tables_info binary format --------------------------------- Name Size (bytes) --------------------------------- db_len 1 (see Note 4) db db_len table_name_len 1 table_name table_name_len ... db_len 1 db db_len table_name_len 1 table_name table_name_len NOTES 1. Currently Query_log_event format doesn't include 'query_len' because it considers the query to extent to the end of the event. 2. If tables_info is not included in the event (--binlog-with-tables-info option), tables_info_len = 0. 3. The trailing zero is redundant since the length is already known. 4. In case of db = current db, db_len = 0 and db = empty, because current db is already included in the current event format. 2. Where to get tables info from? ********************************* 2.1. Case study: CREATE TABLE ****************************** *** CREATE TABLE table [SELECT ...] bool mysql_create_table_no_lock( THD *thd, const char *db, const char *table_name, ...) { ... // ------------------------------------- // WL40: To be included in tables_info: // * db, table_name // * thd->lex->query_tables (tables refered to in // the select-part; empty if no select-part) // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } *** CREATE TABLE table LIKE src-table bool mysql_create_like_table( ... TABLE_LIST *table, TABLE_LIST *src_table, ...) { ... if (thd->current_stmt_binlog_row_based) { // RBR: In this case we don't replicate temp tables if (!(create_info->options & HA_LEX_CREATE_TMP_TABLE)) { if (src_table->table->s->tmp_table) { // CREATE normal-table LIKE temp-table: // Generate new query without LIKE-part store_create_info(thd, table, &query, create_info, FALSE); // ------------------------------------- // WL40: To include to tables_info: // * table (src_table is not included) // ------------------------------------- write_bin_log(thd, TRUE, query.ptr(), query.length()); } else { // CREATE normal-table LIKE normal-table // ------------------------------------- // WL40: To include to log_tables_info: // * table // * src_table // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } } // CREATE temp-table LIKE ... // This case is not replicated } else { // SBR: // ------------------------------------- // WL40: To include to tables_info: // * table // * src_table // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } } To be continued ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0

[Maria-developers] Updated (by Alexi): Add a mysqlbinlog option to filter updates to certain tables (40)
by worklog-noreply＠askmonty.org 04 Nov '09

04 Nov '09

----------------------------------------------------------------------- WORKLOG TASK -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- TASK...........: Add a mysqlbinlog option to filter updates to certain tables CREATION DATE..: Mon, 10 Aug 2009, 13:25 SUPERVISOR.....: Monty IMPLEMENTOR....: COPIES TO......: Psergey CATEGORY.......: Server-RawIdeaBin TASK ID........: 40 (http://askmonty.org/worklog/?tid=40) VERSION........: Server-9.x STATUS.........: Un-Assigned PRIORITY.......: 60 WORKED HOURS...: 2 ESTIMATE.......: 48 (hours remain) ORIG. ESTIMATE.: 48 PROGRESS NOTES: -=-=(Alexi - Wed, 04 Nov 2009, 10:21)=-=- Low Level Design modified. --- /tmp/wklog.40.old.6734 2009-11-04 10:21:20.000000000 +0200 +++ /tmp/wklog.40.new.6734 2009-11-04 10:21:20.000000000 +0200 @@ -21,9 +21,9 @@ slave_proxy_id 4 exec_time 4 db_len 1 -+ query_len 2 (see Note 1) error_code 2 status_vars_len 2 ++ query_len 2 (see Note 1) + tables_info_len 2 (see Note 2) --------------------------------- BODY: -=-=(Alexi - Tue, 03 Nov 2009, 11:19)=-=- Low Level Design modified. --- /tmp/wklog.40.old.7187 2009-11-03 11:19:22.000000000 +0200 +++ /tmp/wklog.40.new.7187 2009-11-03 11:19:22.000000000 +0200 @@ -1 +1,132 @@ +OPTION: 2.5 Extend Query Events With Tables Info +================================================ +1. Query_log_event Binary Format +******************************** +Changes to be done: + + Query_log_event binary format + --------------------------------- + Name Size (bytes) + --------------------------------- + COMMON HEADER: + timestamp 4 + type 1 + server_id 4 + total_size 4 + master_position 4 + flags 2 + --------------------------------- + POST HEADER: + slave_proxy_id 4 + exec_time 4 + db_len 1 ++ query_len 2 (see Note 1) + error_code 2 + status_vars_len 2 ++ tables_info_len 2 (see Note 2) + --------------------------------- + BODY: + status_vars status_vars_len +- db db_len + 1 ++ db db_len (see Note 3) + query query_len ++ tables_info + + tables_info binary format + --------------------------------- + Name Size (bytes) + --------------------------------- + db_len 1 (see Note 4) + db db_len + table_name_len 1 + table_name table_name_len + ... + db_len 1 + db db_len + table_name_len 1 + table_name table_name_len + +NOTES +1. Currently Query_log_event format doesn't include 'query_len' because + it considers the query to extent to the end of the event. +2. If tables_info is not included in the event (--binlog-with-tables-info + option), tables_info_len = 0. +3. The trailing zero is redundant since the length is already known. +4. In case of db = current db, db_len = 0 and db = empty, because + current db is already included in the current event format. + +2. Where to get tables info from? +********************************* + +2.1. Case study: CREATE TABLE +****************************** + +*** CREATE TABLE table [SELECT ...] + + bool mysql_create_table_no_lock( + THD *thd, + const char *db, + const char *table_name, ...) + { + ... + // ------------------------------------- + // WL40: To be included in tables_info: + // * db, table_name + // * thd->lex->query_tables (tables refered to in + // the select-part; empty if no select-part) + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + +*** CREATE TABLE table LIKE src-table + + bool mysql_create_like_table( + ... + TABLE_LIST *table, + TABLE_LIST *src_table, + ...) + { + ... + if (thd->current_stmt_binlog_row_based) + { // RBR: In this case we don't replicate temp tables + if (!(create_info->options & HA_LEX_CREATE_TMP_TABLE)) + { + if (src_table->table->s->tmp_table) + { // CREATE normal-table LIKE temp-table: + + // Generate new query without LIKE-part + store_create_info(thd, table, &query, create_info, FALSE); + + // ------------------------------------- + // WL40: To include to tables_info: + // * table (src_table is not included) + // ------------------------------------- + write_bin_log(thd, TRUE, query.ptr(), query.length()); + } + else + { // CREATE normal-table LIKE normal-table + + // ------------------------------------- + // WL40: To include to log_tables_info: + // * table + // * src_table + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + } + // CREATE temp-table LIKE ... + // This case is not replicated + } + else + { // SBR: + // ------------------------------------- + // WL40: To include to tables_info: + // * table + // * src_table + // ------------------------------------- + write_bin_log(thd, TRUE, thd->query, thd->query_length); + } + } + +To be continued -=-=(Alexi - Mon, 02 Nov 2009, 11:34)=-=- Worked 2 hours on option 2.5 Worked 2 hours and estimate 48 hours remain (original estimate increased by 50 hours). -=-=(Alexi - Mon, 02 Nov 2009, 11:20)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.4848 2009-11-02 11:20:24.000000000 +0200 +++ /tmp/wklog.40.new.4848 2009-11-02 11:20:24.000000000 +0200 @@ -90,3 +90,25 @@ It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. +2.5 Extend Query Events With Tables Info +---------------------------------------- + +We could extend query events structure with a tables info - a list of tables +which the query refers to: + + <current query event structure> + tables_info_len + dbase_len dbase + table_len table + ... + dbase_len dbase + table_len table + +Note. In case of <dbase> = current data base, we can set dbase_len = 0 + and dbase = empty because current query event structure already + includes current data base name. + +Note. Possibly it is reasonable also to add a --binlog-with-tables-info + option which defines whether tables info must be included to the + query events. + -=-=(Knielsen - Fri, 14 Aug 2009, 15:47)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.10896 2009-08-14 15:47:39.000000000 +0300 +++ /tmp/wklog.40.new.10896 2009-08-14 15:47:39.000000000 +0300 @@ -72,3 +72,21 @@ /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. + +2.4 Implement server functionality to ignore certain tables +----------------------------------------------------------- + +We could add a general facility in the server to ignore certain tables: + + SET SESSION ignored_tables = "db1.t1,db2.t2"; + +This would work similar to --replicate-ignore-table, but in a general way not +restricted to the slave SQL thread. + +It would then be trivial for mysqlbinlog to add such statements at the start +of the output, or probably the user could just do it manually with no need for +additional options for mysqlbinlog. + +It might be useful to integrate this with the code that already handles +--replicate-ignore-db and similar slave options. + -=-=(Psergey - Mon, 10 Aug 2009, 15:41)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.12989 2009-08-10 15:41:23.000000000 +0300 +++ /tmp/wklog.40.new.12989 2009-08-10 15:41:23.000000000 +0300 @@ -1,6 +1,7 @@ - 1. Context ---------- +(See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global +overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl -=-=(Guest - Mon, 10 Aug 2009, 14:52)=-=- Dependency created: 39 now depends on 40 -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High Level Description modified. --- /tmp/wklog.40.old.16985 2009-08-10 14:51:59.000000000 +0300 +++ /tmp/wklog.40.new.16985 2009-08-10 14:51:59.000000000 +0300 @@ -1,3 +1,4 @@ Replication slave can be set to filter updates to certain tables with ---replicate-[wild-]{do,ignore}-table options. This task is about adding similar -functionality to mysqlbinlog. +--replicate-[wild-]{do,ignore}-table options. + +This task is about adding similar functionality to mysqlbinlog. -=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=- High-Level Specification modified. --- /tmp/wklog.40.old.16949 2009-08-10 14:51:33.000000000 +0300 +++ /tmp/wklog.40.new.16949 2009-08-10 14:51:33.000000000 +0300 @@ -1 +1,73 @@ +1. Context +---------- +At the moment, the server has these replication slave options: + + --replicate-do-table=db.tbl + --replicate-ignore-table=db.tbl + --replicate-wild-do-table=pattern.pattern + --replicate-wild-ignore-table=pattern.pattern + +They affect both RBR and SBR events. SBR events are checked after the +statement has been parsed, the server iterates over list of used tables and +checks them againist --replicate instructions. + +What is interesting is that this scheme still allows to update the ignored +table through a VIEW. + +2. Table filtering in mysqlbinlog +--------------------------------- + +Per-table filtering of RBR events is easy (as it is relatively easy to extract +the name of the table that the event applies to). + +Per-table filtering of SBR events is hard, as generally it is not apparent +which tables the statement refers to. + +This opens possible options: + +2.1 Put the parser into mysqlbinlog +----------------------------------- +Once we have a full parser in mysqlbinlog, we'll be able to check which tables +are used by a statement, and will allow to show behaviour identical to those +that one obtains when using --replicate-* slave options. + +(It is not clear how much effort is needed to put the parser into mysqlbinlog. +Any guesses?) + + +2.2 Use dumb regexp match +------------------------- +Use a really dumb approach. A query is considered to be modifying table X if +it matches an expression + +CREATE TABLE $tablename +DROP $tablename +UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' +DELETE ...$tablename ... WHERE // same as above +ALTER TABLE $tablename +.. etc (go get from the grammar) .. + +The advantage over doing the same in awk is that mysqlbinlog will also process +RBR statements, and together with that will provide a working solution for +those who are careful with their table names not mixing with string constants +and such. + +(TODO: string constants are of particular concern as they come from +[potentially hostile] users, unlike e.g. table aliases which come from +[not hostile] developers. Remove also all string constants before attempting +to do match?) + +2.3 Have the master put annotations +----------------------------------- +We could add a master option so that it injects into query a mark that tells +which tables the query will affect, e.g. for the query + + UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... + + +the binlog will have + + /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... + +and further processing in mysqlbinlog will be trivial. DESCRIPTION: Replication slave can be set to filter updates to certain tables with --replicate-[wild-]{do,ignore}-table options. This task is about adding similar functionality to mysqlbinlog. HIGH-LEVEL SPECIFICATION: 1. Context ---------- (See http://askmonty.org/wiki/index.php/Scratch/ReplicationOptions for global overview) At the moment, the server has these replication slave options: --replicate-do-table=db.tbl --replicate-ignore-table=db.tbl --replicate-wild-do-table=pattern.pattern --replicate-wild-ignore-table=pattern.pattern They affect both RBR and SBR events. SBR events are checked after the statement has been parsed, the server iterates over list of used tables and checks them againist --replicate instructions. What is interesting is that this scheme still allows to update the ignored table through a VIEW. 2. Table filtering in mysqlbinlog --------------------------------- Per-table filtering of RBR events is easy (as it is relatively easy to extract the name of the table that the event applies to). Per-table filtering of SBR events is hard, as generally it is not apparent which tables the statement refers to. This opens possible options: 2.1 Put the parser into mysqlbinlog ----------------------------------- Once we have a full parser in mysqlbinlog, we'll be able to check which tables are used by a statement, and will allow to show behaviour identical to those that one obtains when using --replicate-* slave options. (It is not clear how much effort is needed to put the parser into mysqlbinlog. Any guesses?) 2.2 Use dumb regexp match ------------------------- Use a really dumb approach. A query is considered to be modifying table X if it matches an expression CREATE TABLE $tablename DROP $tablename UPDATE ...$tablename ... SET // here '...' can't contain the word 'SET' DELETE ...$tablename ... WHERE // same as above ALTER TABLE $tablename .. etc (go get from the grammar) .. The advantage over doing the same in awk is that mysqlbinlog will also process RBR statements, and together with that will provide a working solution for those who are careful with their table names not mixing with string constants and such. (TODO: string constants are of particular concern as they come from [potentially hostile] users, unlike e.g. table aliases which come from [not hostile] developers. Remove also all string constants before attempting to do match?) 2.3 Have the master put annotations ----------------------------------- We could add a master option so that it injects into query a mark that tells which tables the query will affect, e.g. for the query UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ... the binlog will have /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... and further processing in mysqlbinlog will be trivial. 2.4 Implement server functionality to ignore certain tables ----------------------------------------------------------- We could add a general facility in the server to ignore certain tables: SET SESSION ignored_tables = "db1.t1,db2.t2"; This would work similar to --replicate-ignore-table, but in a general way not restricted to the slave SQL thread. It would then be trivial for mysqlbinlog to add such statements at the start of the output, or probably the user could just do it manually with no need for additional options for mysqlbinlog. It might be useful to integrate this with the code that already handles --replicate-ignore-db and similar slave options. 2.5 Extend Query Events With Tables Info ---------------------------------------- We could extend query events structure with a tables info - a list of tables which the query refers to: <current query event structure> tables_info_len dbase_len dbase table_len table ... dbase_len dbase table_len table Note. In case of <dbase> = current data base, we can set dbase_len = 0 and dbase = empty because current query event structure already includes current data base name. Note. Possibly it is reasonable also to add a --binlog-with-tables-info option which defines whether tables info must be included to the query events. LOW-LEVEL DESIGN: OPTION: 2.5 Extend Query Events With Tables Info ================================================ 1. Query_log_event Binary Format ******************************** Changes to be done: Query_log_event binary format --------------------------------- Name Size (bytes) --------------------------------- COMMON HEADER: timestamp 4 type 1 server_id 4 total_size 4 master_position 4 flags 2 --------------------------------- POST HEADER: slave_proxy_id 4 exec_time 4 db_len 1 error_code 2 status_vars_len 2 + query_len 2 (see Note 1) + tables_info_len 2 (see Note 2) --------------------------------- BODY: status_vars status_vars_len - db db_len + 1 + db db_len (see Note 3) query query_len + tables_info tables_info binary format --------------------------------- Name Size (bytes) --------------------------------- db_len 1 (see Note 4) db db_len table_name_len 1 table_name table_name_len ... db_len 1 db db_len table_name_len 1 table_name table_name_len NOTES 1. Currently Query_log_event format doesn't include 'query_len' because it considers the query to extent to the end of the event. 2. If tables_info is not included in the event (--binlog-with-tables-info option), tables_info_len = 0. 3. The trailing zero is redundant since the length is already known. 4. In case of db = current db, db_len = 0 and db = empty, because current db is already included in the current event format. 2. Where to get tables info from? ********************************* 2.1. Case study: CREATE TABLE ****************************** *** CREATE TABLE table [SELECT ...] bool mysql_create_table_no_lock( THD *thd, const char *db, const char *table_name, ...) { ... // ------------------------------------- // WL40: To be included in tables_info: // * db, table_name // * thd->lex->query_tables (tables refered to in // the select-part; empty if no select-part) // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } *** CREATE TABLE table LIKE src-table bool mysql_create_like_table( ... TABLE_LIST *table, TABLE_LIST *src_table, ...) { ... if (thd->current_stmt_binlog_row_based) { // RBR: In this case we don't replicate temp tables if (!(create_info->options & HA_LEX_CREATE_TMP_TABLE)) { if (src_table->table->s->tmp_table) { // CREATE normal-table LIKE temp-table: // Generate new query without LIKE-part store_create_info(thd, table, &query, create_info, FALSE); // ------------------------------------- // WL40: To include to tables_info: // * table (src_table is not included) // ------------------------------------- write_bin_log(thd, TRUE, query.ptr(), query.length()); } else { // CREATE normal-table LIKE normal-table // ------------------------------------- // WL40: To include to log_tables_info: // * table // * src_table // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } } // CREATE temp-table LIKE ... // This case is not replicated } else { // SBR: // ------------------------------------- // WL40: To include to tables_info: // * table // * src_table // ------------------------------------- write_bin_log(thd, TRUE, thd->query, thd->query_length); } } To be continued ESTIMATED WORK TIME ESTIMATED COMPLETION DATE ----------------------------------------------------------------------- WorkLog (v3.5.9)

1 0