On 21 Oct 2016, at 17:34, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Simon Mudd <simon.mudd@booking.com> writes:
This would result in higher overhead on each event. There is a fixed header
Ok. I’ve been assuming the headers were small (from some casual browsing of things related to the binlog router some time ago), but that may be wrong.
Yes, they are quite small, 10-20 bytes per event or something like that.
Indeed, one of the things about the current binlog format is that there’s little complete documentation outside of the code. Code changes and there’s no clear specification. It makes things much better if what’s currently implicit is explicit and also if the specs are outside of the code. That’s something I
Tell me about it ... it is _very_ hard to change most anything in replication without breaking some odd corner somewhere.
Fixing the case for RBR is good but I feel the focus may be too narrow, especially if the approach can be used more generically.
I certainly have some SBR machines which generate large volumes of bin logs and to be able to compress the events they generate on disk would be most helpful.
Right. This patch compresses query events (ie. statement-based updates) and row-events, so both of these are covered. LOAD DATA INFILE in statement mode is not (but in row-based mode it should be, I think).
I use both LOAD DATA INFILE which is great on the box you load the data into but awful on a downstream slave that "streams down” the data, only to output it to a temporary file which is loaded back in again… [ The design is logical but I’d love to see the LOAD DATA INFILE turned directly into a RBR binlog stream, certainly not be default, but as an option, as that should reduce load on the downstream slaves which would not have to reprocess the input stream as they do now. ] I also have “big inserts” of say 16 MB SBR events often of the type: INSERT INTO … VALUES … [ON DUPLICATE KEY UPDATE …]. For usage such as this the “text” is big, and so the individual event does compress pretty well, so the win would be big. So compression is good and there are several use cases, thus making it as generic as possible would benefit more people. That may not be appropriate for this suggested patch, and it’s good to see people offering solutions to their own issues but at least it could perhaps be considered as future functionality. A side effect of a more generic mechanism would hopefully be that this _same_ mechanism could be implemented upstream and would work even if the internal events that go through the "compression pipeline” are different. That avoids feature drift or dual incompatible implementations which would not be very good and has happened already (GTID). Anyway perhaps I’ve drifted off-topic for your comments on the patch but this certainly “woke me up” … :-) Simon