Michael Widenius <monty@askmonty.org> writes:
Note that it would also be very nice to have in the binlog the exact original statement:
Kristian> Yes. There was actually a customer request for this.
Kristian> I think for this we would actually need a new binlog event type Kristian> (Comment_log_event?). Unless we want to log an empty statement Query_log_event Kristian> containing only a comment (a bit of a hack).
I don't think it would be hard to add a tag to the row-binlog-event to add the original statement in such a way that it would be backward compatible. Someone should dig into the protocol spec to see if that would be possible.
A Statement is logged in RBR as one or more Table_map_log_events, followed by one or more {Write,Update,Delete}_rows_log_event. So candidates would be the first (or last) map event, or the first row event. But the event format really is not very extensible (I checked the code), so not trivial to make backwards compatible. Anyway, it should be possible, new event or squeeze into existing, whatever works best.
Kristian> BINLOG Kristian> WITH TIMESTAMP xxx SERVER_ID 1 MASTER_POS 415 FLAGS 0x0 Kristian> TABLE db1.table1 AS 1 COLUMNS (INT NOT NULL, BLOB, VARCHAR(100)) FLAGS 0x0 Kristian> TABLE db2.table2 AS 2 COLUMNS (CHAR(10)) FLAGS 0x0 Kristian> WRITE_ROW INTO db1.table1(1,3) VALUES (42, 'foobar'), (10, NULL) FLAGS 0x2 Kristian> UPDATE_ROW INTO db2.table2 (1) (1) VALUES FROM ('beforeval') TO ('toval'), Kristian> FROM ('a') TO ('b') FLAGS 0x0 Kristian> DELETE_ROW INTO db2.table2 (1) VALUES ('row_to_delete') FLAGS 0x0;
Kristian> This is basically a dump of what is stored in the events, and would be an Kristian> alternative to BINLOG 'gwWEShMBAA...'.
The above would be a much better option than using the current syntax.
After some thinking, my biggest problem with the above is that we would have to keep the syntax up to date and extend it for every new addition Sun makes to the binary log. Over time that may become a big task in itself.
Yes, my main concern too. Though I don't think there will be much addition to the format, as this would in any case break forwards/backwards compatibility in replication. But still a valid concern.
The problem with AWK is that it doesn't work in the general case; For the general case to work, they would need to parse the statements into token and ensure that they only modify things like the database reference.
Yes, using awk (or whatever) is quite fragile (though still useful in many cases). For databases mysqlbinlog can do better, and for tables in RBR, as the names are directly available in the binlog format. For tables in SBR, or type of statement (ALTER/ANALYZE/etc.), mysqlbinlog can do slightly better since it knows where the statement begins, but it still has to rely on some kind of parsing. So conclusion: it is preferable to have 100% reliable database/table/statement filtering than having to rely on fragile awk-like post-processing. Thanks for the input! - Kristian.