Simon Mudd <simon.mudd@booking.com> writes:
I have not looked at the code in detail but a couple of questions related to the implementation come to mind:
The author is probably the best to answer, but I can tell what I know from reviewing the patch:
(1) Why not use a single compressed event type and encapsulate the whole original event inside ?
This would result in higher overhead on each event. There is a fixed header to each event, which cannot be compressed (without significantly more changes to the code, at least). Thus, a single event encapsulating another full compressed event would require two headers, one uncompressed and one compresed. There might be other pros and cons to each method, this was just what occured to me.
Is it not possible to have a "generic compressed event”? Mark mentions allowing selection of the compression type and that seems useful and I would expect the the compressed event to have this information and the compressed content.
The current patch has reserved three bits for the compression method. Thus, it would be possible in the future to add more compression types (up to a maximum of 8, at least). Actually, it would be good if the code would check the compression type and fail if it is unknown. This will help get proper errors on old slaves if new compression types are added in the future.
(2) Senders would try to compress the event as requested. If the compressed event is not any smaller then do not bother compressing it, just send the original event.
This is not in the current patch, but it could be added easily enough. It might not be needed though, the log_bin_compress_min_len kind of serves a similar purpose.
(3) Receivers on receiving a compressed event would decompress it and then process the extracted event in the same way it’s processed now. Most of the code should haven no idea the event had been compressed.
That's basically how the patch is written.
Perhaps I am missing something but the need to compress events individually probably requires quite a bit more logic and will not be as extensible.
Perhaps. In any case, the patch at hand is based on individually compressed events. Even with individual event types, the patch is nice and clean and leaves much of the existing logic unchanged. My guess is that this will primarily be useful for row-based binlogging, where reasonable compression rates can be expected in many cases where row-based binlogging has particularly high overhead compared to statement-based. Per-event compression is in any case of limited use for events that are small in size. - Kristian.