I read throught the worklogs, did some research, and wrote up a number of comments. I am not sure about the best way to proceed, but this is my suggestion. I have written up in this email all my comments in detail. After any discussion, I suggest I update the worklogs as per the conclusions (though if Sergey wants to do that I am happy with that as well). As you will see, this results in a huge email :-( To help this problem, I will write a separate email, laying out the possible options to be chosen between without technical detail, along with guestimates of time needed to implement. ----------------------------------------------------------------------- General comments: 1. RBR vs. SBR. A fundamental decision in several of the worklogs is whether to do something for RBR only, or something that works for both SBR and RBR. Clearly it is preferable to have the features work the same for both RBR and SBR, but it also rather more difficult to implement. There are already a lot of subtle differences between SBR and RBR in the existing similar options for master and slave, motivated by this difficulty I assume. And the problem is even harder to solve in mysqlbinlog, as we lack information available inside the server (table definitions, parsed SQL statement). There is a need to choose between ease of implementation and completeness of feature. This makes it particular important to be consistent in the choices between the different worklogs. We don't want to make the choice one way in one worklog and the other way in another, as this would cause even more confusion for the user. 2. Quoting semantics. Several of the worklogs require the parsing of table names, eg. from command line options or comments. This is a non-trivial problem, as backtick quotes `...` allow more or less any character in database or table names. There may also be issues of charater sets. For example, this is accepted: mysql> create table `'db.table'` (a int); In the RBR events, the names are stored as bytes prefixed with length, so there is no issue of parsing. I assume the character set implied is utf8, though I need to check. In the command line options like --binlog-rewrite-db=db1->db2 and --binlog-do-table=db.t1, we need to decide what to do. I think it is reasonable to not support everything, as long as the semantics of what the option does is clear. For example, assume table names are given in utf8 (irrespectively of locate), and give an error if special characters like "." or "->" is given in the names. Alternatively, we could try to support the full set of valid database and table names, and character sets. Then mysqlbinlog would consult the locale and convert to utf8. And we would need to implement a parser for quoted names, so one can say --binlog-do-table=`'my.strange.db'`.`"my-wierd-table"`, or --binlog-rewrite-db=`a<->b`->`A>-<B'. One disadvantage is that this (probably?) differs from existing --replicate-do-table etc. options. For MWL#40 option 2.3, there is the suggestion to list tables in a comment: /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... In this case the problem is a bit more serious, as strange table names that are not handled correctly could make the binlog totally corrupt (eg. comment end */ in the names). I am really not comfortable with deliberately implementing such a bug, even if it is unlikely to cause any issues in "normal" cases. I am actually not sure about the character set implied for an SBR event. I think maybe it is given by values of system variables, which would make the problem somewhat harder to solve fully. Or maybe statements are converted to utf8 before being logged. I need to check. One option is to give an error if the option to add the comments is enabled and any table names are strange. Not a 100% clean solution, but simple and maybe acceptable. Alternatively we could implement full quoting in the master, and full parsing of quoted names in mysqlbinlog. Though I still need to do more research to know how to handle character set. 3. Option names. I think the mysqlbinlog command-line option names should not begin with --replicate, as they do not really affect replication, though they are of course related to the slave --replicate-* options. Maybe just drop the --replicate prefix, eg. --do-table, --do-table-wild, --rewrite-db, etc. This is consistent with the existing --database option of mysqlbinlog, which has no --replicate prefix. 4. Adding too many options. One thought reading through these worklogs is that we need to be careful that we do not start adding seperate options for every conceivable filtering of the mysqlbinlog output. If this would happen, we should instead add a general way to do this. Certainly fintering options on table and database name is fine. Filtering of statement types (ALTER/ANALYZE/...), MWL#41, is starting to smell a little bit like too much complexity, though maybe it is ok. Something to keep in mind. 5. Security issues for SQL SECURITY INVOKER stored procedures. Some of the worklog options involve general mechanisms to ignore or rewrite certain SQL statements, eg. rewriting one database name to another in MWL#36. This interacts in a problematic way with stored procedures defined with SQL SECURITY INVOKER semantics. Such stored procedures run with the privilege of the user who created them, even if run by users with less privileges. This means that if the statement rewriting applied to code in stored procedures, this could lead to users being able to manipulate such stored procedures to modify different tables that should not be accessible, circumvent logging/auditing, etc. Similar problems may apply to stored functions, triggers, and views, need to check. Something needs to be done to handle this. One option might be to make such statement rewriting have no effect inside stored procedure executing code. Another option would be to restrict such rewriting to the SUPER privilege, though that would somewhat restrict the added mysqlbinlog functionality. ----------------------------------------------------------------------- MWL#36. Add a mysqlbinlog option to change the used database Option 1: Implement --replicate-rewrite-db in mysqlbinlog. This is simple to implement for RBR. The problem is it is hard to implement consistent behaviour for SBR without having the parsed SQL statement. We can do something similar to --replicate-rewrite-db, but it will still apply only to the 'current' database for SBR, while for RBR it will apply both to current database and explicitly mentioned database. See also general comment 1, quoting semantics. Option 2: Add database-agnostic RBR events and --strip-db option I think this is fairly similar to option 1 in terms of end user experience. It will have the same problems of inconsistency between RBR and SBR for current database. It does not support multiple database rewrites (though option 1 also does not work with multiple rewrites for SBR, only for RBR). I prefer option 1, it feels cleaner, is probably simpler to implement (no need to change the server), and I think it solves more or less the same set of end user problems. Option 3: Server-side database rewrite support. This is a general server feature to allow rewriting one database name to another. Some good points about this option: - It is inside the server, so the parsed SQL statement is available and easier to be consistent between RBR and SBR. - It is actually a generally useable feature, not restricted to mysqlbinlog output. I do not like the suggested syntax: /* !database-alias(oldname,newname) */ Why a magic comment? Why not a proper SQL statement? Better to use for example a session variable: SET SESSION rewrite_database "db1->db1a,db2->db2a" This also makes it clear what the scope of the rewrite is, and how to change the mapping. It does have the quoting semantics issue (see general comment 2). The quoting issues could be solved by adding a new kind of statement, like for example: DATABASE ALIAS db1 AS db2, `strage...db` AS `sanename`; But then we have to define the scope of the statement, and I think maybe this is too small a think for adding a new statement. But comments welcome. In any case, I strongly prefer system variable or new statement to magic comment. Security concerns regarding stored procedures apply (general comment 5). ----------------------------------------------------------------------- MWL#37: Add an option to mysqlbinlog to produce SQL script with fewer roundtrips This seems straight-forward enough to implement as described. ----------------------------------------------------------------------- MWL#38: Make mysqlbinlog not to output unneeded statements when using --database Should probably check carefully if there are other statements than COMMIT or SET INSERT_ID that can appear without being useful, and should be eliminated along with these two. Otherwise, this seems straight-forward enough to implement as described. ----------------------------------------------------------------------- MWL#40: Add a mysqlbinlog option to filter updates to certain tables Implement in mysqlbinlog similar options to --replicate-do-table=db.tbl --replicate-ignore-table=db.tbl --replicate-wild-do-table=pattern.pattern --replicate-wild-ignore-table=pattern.pattern (see general comment 3 regarding option names). RBR is simple to do (option 2.0). Main issue is how to handle SBR. Option 2.0: RBR only. This should be easy to implement. Main problem is inconsistency with SBR, see general comment 1. Option 2.1: Embed the SQL parser. There are a number of things that makes this attractive. However, I think it is outside the scope of this particular project (unfortunately). This option would need to be split up in multiple worklogs, the first of which concern(s) removing dependencies in the parser from the rest of the server code. (I don't think embedded server is a solution, as it would probably require having table .frm files, which are not available in mysqlbinlog). Option 2.2: Use dumb regexp match This option I do not like. It basically amounts to re-implementing part of the parser, in a flawed way. It will not work in all cases, and it will be very hard for the user to understand exactly when it works and when it does not. For something like this, I would prefer to implement readable BINLOG statement (MWL#46), and have the users make their own regexp. Then at least they can know the limitations. But a robust solution is still to be prefered. 2.3 Have the master put annotations /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... Quoting semantics is an issue, see general comment 2. I think this should be a server option, off by default, as it is putting redundant information into the binlog. I need to check if the necessary information is available in the server when binlogging, or how it can be made available. The docs for --replicate-wild-do-table says: "Tells the slave thread to restrict replication to statements where any of the updated tables match the specified database and table name patterns" So to be consistent with this, I think for multi-statements like DELETE t1 FROM t1, t2 WHERE <condition>; UPDATE t1, t2 SET t1.a = 1 WHERE <condition>; only t1 should be in the table list in the comment, not t2. 2.4. Implement server functionality to ignore certain tables This is similar to option 3 of MWL#36, so would make sense to choose or reject these together. SET SESSION ignored_tables = "db1.t1,db2.t2"; Again, quoting semantics is an issue (general comment 2). As with option 3 of MWL#36, another possibility is adding a seperate, new SQL statement for this, and again it maybe seems a bit much to add a separate statement for something like this. Security concerns regarding stored procedures apply (general comment 5). I vaguely prefer 2.4 to 2.3 due to being more general and not adding redundant info (which seems a bit hackish); also for consistency with option 3 of MWL#36 (if we choose that one). ----------------------------------------------------------------------- MWL#41: Add a mysqlbinlog option to filter certain kinds of statements Option 1: Embedded parser. (same comment as option 2.1 of MWL#40). Option 2: Partial parsing This amounts to re-implementing part of the parser. I generally do not like the idea of implementing a partial parser from scratch, as it will be inherently error prone to get it to be, and keep it in, 100% sync with the real parser. It will be quite hard to avoid missing one or two special cases where we will miss ignoring a statement in mysqlbinlog due to some special syntax not handled correctly. That being said, it seems this particular case actually could be handled more or less correctly with some effort, as it is not too complex. Basically, it seems the statement is always identifiable from first 1-2 keywords. Still there are issues to handle, like whitespace (might differ depending on character sets, need to check if SBR logs in client charset or utf8), comments, conditional execution /*!40101 <statement */, etc. Option 3: Server-side support. SET SESSION ignored_statements="alter table, analyze table, ..."; The nice thing about this option is that the parsed SQL statement is available. Security concerns regarding stored procedures apply (general comment 5). If we go for option 3 of MWL#36, it would make sense to choose this one as well for consistency. ----------------------------------------------------------------------- - Kristian.