[Maria-developers] WL40: Current status & issues to discuss
Bo suggested me to publish my last WL#40 status report on the dev list. Meaning to get a feedback (critical notes, tips, suggestions, etc) I rewrote the report to make it more readable and included some thoughts for discussion. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WL40. Add a mysqlbinlog option to filter updates to certain tables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Selected option : "2.5 Extend Query Events With Tables Info". 1. Extending Query Events With Tables Info 2. Example of the Extended Query Event Binary Format 3. Adding binlog-with-tables-info Option 4. How Tables Info is Formed 4.1. CREATE TABLE statement 4.2. RENAME TABLE Statement 5. TODO: Is Tables Info Enough? 6. TODO: How Renames are to be filtered? 7. Other TODO's 1. Extending Query Events With Tables Info ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reason: to do filtering based on "*-table" options we need to have a list of all tables refered to by the query described by Query event. The Query event binary format is extended with a tables info, i.e. the list of tables involved in the query execution. More precisely, two following fields are added to the Query event header: * Query length (2 bytes). (Reason: in usual format, Query event doesn't include this field because the query is considered to extent to the end of the event what is not the case for the extended format). * Tables info length (2 bytes). Length of tables info contained in the data part of the event. and the following data is added to the Query event data part: * Tables info (tables_info_len bytes). List of 4-tuples (db_len, db, table_name_len, table_name) for each table involved in the execution of the query described by this event. db_len may be zero meaning that this is the default db. Note. In the extended format the default db is written without trailing zero which is redundant since its length is already known from the event header. 2. Example of the Extended Query Event Binary Format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ************************ QUERY_EVENT [2] ************************ 00000294 | 2F A1 00 4B | time_when = 1258332463 00000298 | 02 | event_type = 2 00000299 | 64 00 00 00 | server_id = 100 0000029D | 75 00 00 00 | event_len = 117 000002A1 | 09 03 00 00 | log_pos = 00000309 000002A5 | 10 00 | flags = LOG_EVENT_UPDATE_TABLE_MAP_VERSION_F ------------------------ 000002A7 | 01 00 00 00 | proxy_id = 1 000002AB | C9 00 00 00 | exec_time = 201 000002AF | 04 | dbase_len = 4 000002B0 | 00 00 | error_code = 0 000002B2 | 1A 00 | status_vars_len = 26 000002B4 | 2B 00 | query_len = 43 (*) 000002B6 | 08 00 | tables_info_len = 8 (*) ------------------------ Status Variables ------------------------ 000002B8 | 00 | FLAGS2: 000002B9 | 00 40 00 00 | 0x00004000 000002BD | 01 | SQL_MODE: 000002BE | 00 00 00 00 | 0x0000000000000000 000002C2 | 00 00 00 00 | 000002C6 | 06 | CATALOG_NZ: 000002C7 | 03 | catalog_len = 3 000002C8 | 73 74 64 | catalog = "std" 000002CB | 04 | CHARSET_CODE: 000002CC | 08 00 | charset_client = 8 000002CE | 08 00 | collation_connect = 8 000002D0 | 08 00 | collation_server = 8 ------------------------ 000002D2 | 74 65 73 74 | dbase = "test" (**) 000002D6 | 52 45 4E 41 | query = "RENAME TABLE t1 TO tmp, t2 TO t1, tmp TO t2" 000002DA | 4D 45 20 54 | 000002DE | 41 42 4C 45 | 000002E2 | 20 74 31 20 | 000002E6 | 54 4F 20 74 | 000002EA | 6D 70 2C 20 | 000002EE | 74 32 20 54 | 000002F2 | 4F 20 74 31 | 000002F6 | 2C 20 74 6D | 000002FA | 70 20 54 4F | 000002FE | 20 74 32 | ------------------------ Tables info (*) ------------------------ 00000301 | 00 | dbase_len = 0 (***) 00000302 | 02 | table_len = 2 00000303 | 74 31 | table_str = "t1" 00000305 | 00 | dbase_len = 0 (***) 00000306 | 02 | table_len = 2 00000307 | 74 32 | table_str = "t2" ************************ (*) - Added to the new Query event binary format. (**) - Modified: in the new Query event binary format database name does not contain trailing zero. (***) - Database length = 0 means that it is the default database. 3. Adding binlog-with-tables-info Option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ New --binlog-with-tables-info option is added. When set, all Query events are to be written to the extended format (with tables info). Reasons for adding the option: * Adding tables info leads binary log to grow in size what is superfluous if filtering within mysqlbinlog is not supposed to be used. * If the current implementation of the extended format will appear to have bugs switching off this option will guarantee that replication will work as before that. The option is defined as GLOBAL and read-only. Reason: * The Query events format used in the binary log can be recognized only by its length contained in the Format description event at the beginning of the binary log. Because of that, two different Query event formats can not be mixed within one binary log. 4. How Tables Info is Formed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Forming tables info for some types of statement is not quite straightforward. Below is two examples of those "not-straightforward" cases. 4.1. CREATE TABLE statement ~~~~~~~~~~~~~~~~~~~~~~~~~~~ CREATE [TEMPORARY] TABLE [IF NOT EXISTS] t (...) CREATE [TEMPORARY] TABLE [IF NOT EXISTS] t LIKE t1 CREATE [TEMPORARY] TABLE [IF NOT EXISTS] t SELECT (t1, ..., tn) Implementation of forming tables info in this case is based on the following rules of binlogging CREATE TABLE statements: SBR. The original CREATE TABLE statement is always binlogged as is. In this case tables_info = { t }, { t, t1 }, or { t, t1, ..., tn } respectively. RBR. In case of RBR we have the following specifics of binlogging CREATE TABLE statement: * temporary tables are not binlogged; * CREATE TABLE t LIKE t1 with temporary t1 is transformed to the ordinary CREATE TABLE (...); * CREATE TABLE SELECT is always transformed to the ordinary CREATE TABLE (...) (with possible subsequent INSERT statements). So for RBR tables info is formed as follows: IF t is temporary // the statement will not be binlogged tables_info = empty IF CREATE TABLE t (...) // the statement will be binlogged as is tables_ info = {t} IF CREATE TABLE t LIKE t1 IF t1 is temporary // the statement will be binlogged as CREATE TABLE t (...) tables_info = {t} ELSE // the statement will be binlogged as is tables_info = {t, t1} IF CREATE TABLE t SELECT (t1, ..., tn) // the statement will be binlogged as CREATE TABLE (...) tables_info = {t} Note. In case of RBR, CREATE TABLE SELECT is binlogged as the following sequence of events: BEGIN: Query event CREATE TABLE: Query event INSERT's: Write row events COMMIT Query event BEGIN and COMMIT in this case are supplied with the same tables_info = {t} as the CREATE TABLE statement. Also INSERT's are preceded by the Table map event refering to the same table t. So filtering of all this events (including BEGIN and COMMIT) will be based on the same tables info and either all of them will pass the filter or all of them will be filtered out. 4.2. RENAME TABLE Statement ~~~~~~~~~~~~~~~~~~~~~~~~~~~ RENAME TABLE tl TO s1, ..., tn TO sn For this type of statements tables info is formed, roughly speaking, of tables standing left to "TO", i.e. t1, ..., tn. But including all t1, ..., tn is not always correct. Consider renaming t1 <-> t2: RENAME TABLE t1 TO tmp, t2 TO t1, tmp TO t2 Setting tables_info = ( t1, t2, tmp } is not good: mysqlbinlog being executed with do-tables = { t1, t2 } will filter this statement away because tmp does not belong to do-tables. So before adding ti to the tables_info we have to check that ti != sj for all j < i. See tables_info in the example of Section 2. 5. TODO: Is Tables Info Enough? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Assume we run mysqlbinlog with ignore_tables = { t1 } for the following sequence of statements: (1) CREATE TABLE t1 (...); ... (2) CREATE TABLE t2 LIKE t1; ... (3) INSERT INTO t2 VALUES ...; ... In this case: (1) will be filtered away - directly stated by the ignore rules; (2) will be filtered away - can not be executed because refers to t1 (3) will be left - though can not be executed It is not good to have erroneous statements in mysqlbinlog output. We can overcome this in two ways: 1. To explain to the user that in those cases, to avoid erroneous statements, he/she should add t2 to the ignore_tables list as well. 2. To make the filtering bit clever. E.g., additionally to tables info, we may extend the Query event format with a Statement type field (BTW, see also WL#41 "Add a mysqlbinlog option to filter certain kinds of statements"). In this case, when filtering, if we reject a "CREATE TABLE t ..." statement because its LIKE or SELECT part refers to an ignored table, we add table t to the ignore_tables list ourself. 6. TODO: How Renames are to be filtered? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Consider the RENAME TABLE statement. If we have do_tables = { t } and t is renamed to s: (1) RENAME TABLE t TO s; <smth for s> what the filter should do? Should we consider t and s to be the same table with different names so that <smth for s> is not to be filtered away (in this case during filtering process we should add s to the do_tables list - like we do it in the case of CREATE TABLE)? Or should we treat do- ignore-table rules as talking about *table names* so that --do-table=t means "do only table with name t" - then we filter <smth for s> away because it refers to the table with another name s? Note that in the last case if we rename s back to t: (2) RENAME TABLE s TO t; <smth for t> the RENAME statement will be filtered away (it renames a table which does not belong to the do_tables list) while <smth for t> will pass the filter and its subsequent execution will result with error (we can resolve this by creating a rename_list during the filtering process to trace all renamings specified by the processed events - similar to what is described in Section 4.2). Note. With the second treatment of renaming, we can do the following (looks like attractive but not sure is usefull :) <smth for t> RENAME TABLE t TO s <smth for s that can be filtered away> RENAME TABLE s TO t <smth for t> 7. Other TODO's ~~~~~~~~~~~~~~~ * Continue with forming tables_info for different types of query. * Filtering algorithm and *-table options for mysqlbinlog. Should it be "clever" or "stupid" (see Sections 5 & 6) ? * Add Statement type field to the Query event format (?) Needed for "clever' filtering and for WL#41 * Other stuff (testing, documenting, etc)
Alexi1952 <Alexi1952@yandex.ru> writes:
The Query event binary format is extended with a tables info, i.e. the list of tables involved in the query execution. More precisely, two following fields are added to the Query event header:
* Query length (2 bytes). (Reason: in usual format, Query event doesn't include this field because the query is considered to extent to the end of the event what is not the case for the extended format).
* Tables info length (2 bytes). Length of tables info contained in the data part of the event.
and the following data is added to the Query event data part:
* Tables info (tables_info_len bytes). List of 4-tuples (db_len, db, table_name_len, table_name) for each table involved in the execution of the query described by this event. db_len may be zero meaning that this is the default db.
This sounds somewhat dangerous. We need to consider the effect on compatibility with other servers which do not have this change. As far as I understand, MySQL replication is backwards and forwards compatible. This means that an old slave can replicate from a new master, and a new slave can replicate from an old master. I guess there must be some kind of limitations for this, but I remember seeing lots of code in the replication sources to handle such issues. Therefore, when proposing changes to the event format, we need a full analysis of the consequences for replicating from/to different server versions. We also need to consider replication to/from Sun MySQL, which probably means some kind of coordination with Sun replication engineers. (I'm not saying that we cannot change replication event formats, just that we need to do it really carefully). Also, don't know if this worklog is being actively worked on, just wanted to remark this to be sure it is not forgotten. Hope this helps, - Kristian.
participants (2)
-
Alexi1952
-
Kristian Nielsen