Re: [Maria-developers] [MariaDB/server] Compressed binary log (#247)
GCSAdmin <notifications@github.com> writes:
We add new event types to support compress the binlog as follow: QUERY_COMPRESSED_EVENT, WRITE_ROWS_COMPRESSED_EVENT_V1, UPDATE_ROWS_COMPRESSED_EVENT_V1, DELETE_POWS_COMPRESSED_EVENT_V1, WRITE_ROWS_COMPRESSED_EVENT, UPDATE_ROWS_COMPRESSED_EVENT, DELETE_POWS_COMPRESSED_EVENT
Thanks for the patch! Overall, I'm much impressed with the quality, I see a lot of attention to detail. Below, I have a number of comments and suggestions, but they are all minor for a patch of this complexity. A number of the comments are style fixes or simple changes, and I found it easier to just do the changes to the source for you to review. I hope that is ok. I rebased the series without intermediate merge to simplify review to a single patch. The rebase with my suggested changes on top is here: https://github.com/knielsen/server/commits/GCSAdmin-10.2-binlog-compressed2-... Please check the changes I made and let me know if you disagree with anything or I misunderstood something. Changes are mostly: - A bunch of code style (indentation, lines <= 80 chars, clarify comments, and so on). - Change LOG_EVENT_IS_QUERY() etc. from macros to static inline functions. - check_event_type() updated (new code merged into 10.2 recently). - Minor .result file update, also from code merged into 10.2 recently. I also have the following questions/suggestions: 1. I think the code should sanity-check the compressed event header, to check that it does not access outside the event buffer in corrupt events. Eg. if lenlen is 7 and there is less than 7 bytes available in the event. I know that replication code in general is not robust to corrupt events, but it seems best that new code avoids overflowing buffers in case of corrupt event data. 2. There are some places where the compressed event types are not added to switch() or if() statements, mainly related to minimal binlog images, I think: Rows_log_event::read_write_bitmaps_cmp(), Rows_log_event::get_data_size(), Rows_log_event::do_apply_event(), Rows_log_event::write_data_body(). I *think* the reason for that is that the SQL thread never sees the compressed events (they are uncompressed by the IO thread), but I would like your confirmation that my understanding is correct. 3. Did you think about testing that BINLOG statements with compressed events can execute correctly? This happens with mysqlbinlog | mysql, when there are (compressed) row-based events in the binlog. I'm wondering if this could expose compressed events to code that normally runs in the SQL thread and expects to have events uncompressed for it by the IO thread? A few detailed comments on the patch below (most comments are done as changes in the above-linked git branch). Otherwise, after answer to above 3 questions, I think the patch looks good to go into 10.2. - Kristian. -----------------------------------------------------------------------
+#define BINLOG_COMPRESSED_HEADER_LEN 1 +#define BINLOG_COMPRESSED_ORIGINAL_LENGTH_MAX_BYTES 4 +/** + Compressed Record + Record Header: 1 Byte + 0 Bit: Always 1, mean compressed; + 1-3 Bit: Reversed, compressed algorithm??Always 0, means zlib + 4-7 Bit: Bytes of "Record Original Length" + Record Original Length: 1-4 Bytes + Compressed Buf:
I did not understand this. Why "reversed"? It seems to be bits 0-2 that have the bytes of "Record Original Length" and bit 7 that has the '1' bit meaning compression enabled? Maybe this can be clarified.
+int query_event_uncompress(const Format_description_log_event *description_event, bool contain_checksum, + const char *src, char* buf, ulong buf_size, bool* is_malloc, + char **dst, ulong *newlen) +{ + ulong len = uint4korr(src + EVENT_LEN_OFFSET); + const char *tmp = src; + + DBUG_ASSERT((uchar)src[EVENT_TYPE_OFFSET] == QUERY_COMPRESSED_EVENT); + + uint8 common_header_len= description_event->common_header_len; + uint8 post_header_len= description_event->post_header_len[QUERY_COMPRESSED_EVENT-1]; + + tmp += common_header_len; + + uint db_len = (uint)tmp[Q_DB_LEN_OFFSET]; + uint16 status_vars_len= uint2korr(tmp + Q_STATUS_VARS_LEN_OFFSET); + + tmp += post_header_len + status_vars_len + db_len + 1; + + uint32 un_len = binlog_get_uncompress_len(tmp); + *newlen = (tmp - src) + un_len;
This is one place I would like to see a check on the data in the event. Maybe just that 'len' is larger than 1+(tmp[0]&7). Or maybe simply that len is at least 10, the minimal size for compressed events.
+int row_log_event_uncompress(const Format_description_log_event *description_event, bool contain_checksum, + const char *src, char* buf, ulong buf_size, bool* is_malloc, + char **dst, ulong *newlen) +{
+ uint32 un_len = binlog_get_uncompress_len(tmp); + *newlen = (tmp - src) + un_len;
Another place where I'd like a check against looking outside the event buffer.
+int binlog_buf_uncompress(const char *src, char *dst, uint32 len, uint32 *newlen) +{ + if((src[0] & 0x80) == 0) + { + return 1; + } + + uint32 lenlen = src[0] & 0x07; + uLongf buflen = *newlen; + if(uncompress((Bytef *)dst, &buflen, (const Bytef*)src + 1 + lenlen, len) != Z_OK)
Again check on length.
+#define BINLOG_COMPRESSED_HEADER_LEN 1 +#define BINLOG_COMPRESSED_ORIGINAL_LENGTH_MAX_BYTES 4 +/** + Compressed Record + Record Header: 1 Byte + 0 Bit: Always 1, mean compressed; + 1-3 Bit: Reversed, compressed algorithm??Always 0, means zlib
Good to see this progressed from the old MySQL bug I saw originally. I'm a bit late into this however I suggest we can use a better algorithm than zlib. There are a significant number of algorithms that may have better compression at similar speed faster like brotli, zstd, snappy. https://quixdb.github.io/squash-benchmark/#results-table
How easy will it be for this to use a lower latency compression algorithm like lz4, Snappy or zstandard? On Thu, Oct 20, 2016 at 2:52 PM, Daniel Black <daniel.black@au1.ibm.com> wrote:
+#define BINLOG_COMPRESSED_HEADER_LEN 1 +#define BINLOG_COMPRESSED_ORIGINAL_LENGTH_MAX_BYTES 4 +/** + Compressed Record + Record Header: 1 Byte + 0 Bit: Always 1, mean compressed; + 1-3 Bit: Reversed, compressed algorithm??Always 0, means zlib
Good to see this progressed from the old MySQL bug I saw originally.
I'm a bit late into this however I suggest we can use a better algorithm than zlib. There are a significant number of algorithms that may have better compression at similar speed faster like brotli, zstd, snappy.
https://quixdb.github.io/squash-benchmark/#results-table
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Mark Callaghan mdcallag@gmail.com
Hi mark There are 3 bits reserved for compressed algorithm in Compressed Record. It would be always 0 at present, means zlib. It can be easy to support other compression algorithm in the future. Compressed Record Record Header: 1 Byte 7 Bit: Always 1, mean compressed; 4-6 Bit: Reversed, compressed algorithm Now it is always 0, means zlib. It maybe support other compression algorithm in the future. 0-3 Bit: Bytes of "Record Original Length" Record Original Length: 1-4 Bytes Compressed Buf: 2016-10-21 7:32 GMT+08:00 MARK CALLAGHAN <mdcallag@gmail.com>:
How easy will it be for this to use a lower latency compression algorithm like lz4, Snappy or zstandard?
On Thu, Oct 20, 2016 at 2:52 PM, Daniel Black <daniel.black@au1.ibm.com> wrote:
+#define BINLOG_COMPRESSED_HEADER_LEN 1 +#define BINLOG_COMPRESSED_ORIGINAL_LENGTH_MAX_BYTES 4 +/** + Compressed Record + Record Header: 1 Byte + 0 Bit: Always 1, mean compressed; + 1-3 Bit: Reversed, compressed algorithm??Always 0, means zlib
Good to see this progressed from the old MySQL bug I saw originally.
I'm a bit late into this however I suggest we can use a better algorithm than zlib. There are a significant number of algorithms that may have better compression at similar speed faster like brotli, zstd, snappy.
https://quixdb.github.io/squash-benchmark/#results-table
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Mark Callaghan mdcallag@gmail.com
Hi, Compressed binlogs are something that have interested me for some time[1] so it is nice to see patches which provide this functionality. I have not looked at the code in detail but a couple of questions related to the implementation come to mind: (1) Why not use a single compressed event type and encapsulate the whole original event inside ? Is it not possible to have a "generic compressed event”? Mark mentions allowing selection of the compression type and that seems useful and I would expect the the compressed event to have this information and the compressed content. (2) Senders would try to compress the event as requested. If the compressed event is not any smaller then do not bother compressing it, just send the original event. (3) Receivers on receiving a compressed event would decompress it and then process the extracted event in the same way it’s processed now. Most of the code should haven no idea the event had been compressed. (4) This logic would only require a single new event type and also allow all types of events (including any future ones) to be compressed without requiring extra logic Perhaps I am missing something but the need to compress events individually probably requires quite a bit more logic and will not be as extensible. Thanks for sharing any thoughts on this. Regards, Simon [1] See: https://bugs.mysql.com/bug.php?id=71696 and various other FRs related to this topic.
Simon Mudd <simon.mudd@booking.com> writes:
I have not looked at the code in detail but a couple of questions related to the implementation come to mind:
The author is probably the best to answer, but I can tell what I know from reviewing the patch:
(1) Why not use a single compressed event type and encapsulate the whole original event inside ?
This would result in higher overhead on each event. There is a fixed header to each event, which cannot be compressed (without significantly more changes to the code, at least). Thus, a single event encapsulating another full compressed event would require two headers, one uncompressed and one compresed. There might be other pros and cons to each method, this was just what occured to me.
Is it not possible to have a "generic compressed event”? Mark mentions allowing selection of the compression type and that seems useful and I would expect the the compressed event to have this information and the compressed content.
The current patch has reserved three bits for the compression method. Thus, it would be possible in the future to add more compression types (up to a maximum of 8, at least). Actually, it would be good if the code would check the compression type and fail if it is unknown. This will help get proper errors on old slaves if new compression types are added in the future.
(2) Senders would try to compress the event as requested. If the compressed event is not any smaller then do not bother compressing it, just send the original event.
This is not in the current patch, but it could be added easily enough. It might not be needed though, the log_bin_compress_min_len kind of serves a similar purpose.
(3) Receivers on receiving a compressed event would decompress it and then process the extracted event in the same way it’s processed now. Most of the code should haven no idea the event had been compressed.
That's basically how the patch is written.
Perhaps I am missing something but the need to compress events individually probably requires quite a bit more logic and will not be as extensible.
Perhaps. In any case, the patch at hand is based on individually compressed events. Even with individual event types, the patch is nice and clean and leaves much of the existing logic unchanged. My guess is that this will primarily be useful for row-based binlogging, where reasonable compression rates can be expected in many cases where row-based binlogging has particularly high overhead compared to statement-based. Per-event compression is in any case of limited use for events that are small in size. - Kristian.
Hi Kristian,
On 21 Oct 2016, at 13:22, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Simon Mudd <simon.mudd@booking.com> writes:
I have not looked at the code in detail but a couple of questions related to the implementation come to mind:
The author is probably the best to answer, but I can tell what I know from reviewing the patch:
(1) Why not use a single compressed event type and encapsulate the whole original event inside ?
This would result in higher overhead on each event. There is a fixed header to each event, which cannot be compressed (without significantly more changes to the code, at least). Thus, a single event encapsulating another full compressed event would require two headers, one uncompressed and one compressed.
Ok. I’ve been assuming the headers were small (from some casual browsing of things related to the binlog router some time ago), but that may be wrong.
There might be other pros and cons to each method, this was just what occured to me.
Is it not possible to have a "generic compressed event”? Mark mentions allowing selection of the compression type and that seems useful and I would expect the the compressed event to have this information and the compressed content.
The current patch has reserved three bits for the compression method. Thus, it would be possible in the future to add more compression types (up to a maximum of 8, at least).
Ok. good to know.
Actually, it would be good if the code would check the compression type and fail if it is unknown. This will help get proper errors on old slaves if new compression types are added in the future.
Indeed, one of the things about the current binlog format is that there’s little complete documentation outside of the code. Code changes and there’s no clear specification. It makes things much better if what’s currently implicit is explicit and also if the specs are outside of the code. That’s something I tried to mention to Oracle with the new X protocol and obviously out of scope of this discussion but the intricacies of the binlog traffic are quite hard to follow for mere mortals…
(2) Senders would try to compress the event as requested. If the compressed event is not any smaller then do not bother compressing it, just send the original event.
This is not in the current patch, but it could be added easily enough. It might not be needed though, the log_bin_compress_min_len kind of serves a similar purpose.
It just saves the receiver doing extra work and I’d expect that allowing that would be trivial even if not implemented now. Just a thought anyway. …
Perhaps I am missing something but the need to compress events individually probably requires quite a bit more logic and will not be as extensible.
Perhaps. In any case, the patch at hand is based on individually compressed events. Even with individual event types, the patch is nice and clean and leaves much of the existing logic unchanged.
My guess is that this will primarily be useful for row-based binlogging, where reasonable compression rates can be expected in many cases where row-based binlogging has particularly high overhead compared to statement-based. Per-event compression is in any case of limited use for events that are small in size.
Sure so workload clearly affects the usefulness of a patch such as this one. That said I’ve seen SBR traffic with very large packets as many bulk changes are massaged to fit to max_allowed_packet, so here even in SBR you can make a huge size saving if you have such statements. Minimal RBR can help reduce some of the space concerns that RBR generates, and I use it a lot (and requested this functionality) specifically for this reason. However there are use cases where it can not be used (exporting data to external systems that may need/want to have the whole “change log stream”) so clearly compression here may be a good bit of extra functionality. Fixing the case for RBR is good but I feel the focus may be too narrow, especially if the approach can be used more generically. I certainly have some SBR machines which generate large volumes of bin logs and to be able to compress the events they generate on disk would be most helpful. Thanks for sharing your thoughts. Simon
Simon Mudd <simon.mudd@booking.com> writes:
This would result in higher overhead on each event. There is a fixed header
Ok. I’ve been assuming the headers were small (from some casual browsing of things related to the binlog router some time ago), but that may be wrong.
Yes, they are quite small, 10-20 bytes per event or something like that.
Indeed, one of the things about the current binlog format is that there’s little complete documentation outside of the code. Code changes and there’s no clear specification. It makes things much better if what’s currently implicit is explicit and also if the specs are outside of the code. That’s something I
Tell me about it ... it is _very_ hard to change most anything in replication without breaking some odd corner somewhere.
Fixing the case for RBR is good but I feel the focus may be too narrow, especially if the approach can be used more generically.
I certainly have some SBR machines which generate large volumes of bin logs and to be able to compress the events they generate on disk would be most helpful.
Right. This patch compresses query events (ie. statement-based updates) and row-events, so both of these are covered. LOAD DATA INFILE in statement mode is not (but in row-based mode it should be, I think). - Kristian.
On 21 Oct 2016, at 17:34, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Simon Mudd <simon.mudd@booking.com> writes:
This would result in higher overhead on each event. There is a fixed header
Ok. I’ve been assuming the headers were small (from some casual browsing of things related to the binlog router some time ago), but that may be wrong.
Yes, they are quite small, 10-20 bytes per event or something like that.
Indeed, one of the things about the current binlog format is that there’s little complete documentation outside of the code. Code changes and there’s no clear specification. It makes things much better if what’s currently implicit is explicit and also if the specs are outside of the code. That’s something I
Tell me about it ... it is _very_ hard to change most anything in replication without breaking some odd corner somewhere.
Fixing the case for RBR is good but I feel the focus may be too narrow, especially if the approach can be used more generically.
I certainly have some SBR machines which generate large volumes of bin logs and to be able to compress the events they generate on disk would be most helpful.
Right. This patch compresses query events (ie. statement-based updates) and row-events, so both of these are covered. LOAD DATA INFILE in statement mode is not (but in row-based mode it should be, I think).
I use both LOAD DATA INFILE which is great on the box you load the data into but awful on a downstream slave that "streams down” the data, only to output it to a temporary file which is loaded back in again… [ The design is logical but I’d love to see the LOAD DATA INFILE turned directly into a RBR binlog stream, certainly not be default, but as an option, as that should reduce load on the downstream slaves which would not have to reprocess the input stream as they do now. ] I also have “big inserts” of say 16 MB SBR events often of the type: INSERT INTO … VALUES … [ON DUPLICATE KEY UPDATE …]. For usage such as this the “text” is big, and so the individual event does compress pretty well, so the win would be big. So compression is good and there are several use cases, thus making it as generic as possible would benefit more people. That may not be appropriate for this suggested patch, and it’s good to see people offering solutions to their own issues but at least it could perhaps be considered as future functionality. A side effect of a more generic mechanism would hopefully be that this _same_ mechanism could be implemented upstream and would work even if the internal events that go through the "compression pipeline” are different. That avoids feature drift or dual incompatible implementations which would not be very good and has happened already (GTID). Anyway perhaps I’ve drifted off-topic for your comments on the patch but this certainly “woke me up” … :-) Simon
Thanks for knielsen's nice replay. I would like to add some comments 2016-10-21 22:31 GMT+08:00 Simon Mudd <simon.mudd@booking.com>:
Hi Kristian,
On 21 Oct 2016, at 13:22, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Simon Mudd <simon.mudd@booking.com> writes:
I have not looked at the code in detail but a couple of questions related to the implementation come to mind:
The author is probably the best to answer, but I can tell what I know from reviewing the patch:
(1) Why not use a single compressed event type and encapsulate the whole original event inside ?
This would result in higher overhead on each event. There is a fixed header to each event, which cannot be compressed (without significantly more changes to the code, at least). Thus, a single event encapsulating another full compressed event would require two headers, one uncompressed and one compressed.
Ok. I’ve been assuming the headers were small (from some casual browsing of things related to the binlog router some time ago), but that may be wrong.
Now, the individual compressed event types inherit the original event types, so it may not have too many extra logic. If use a single compressed event type and encapsulate the whole original event inside, I don't think it easy to do it without too much code modification. Such as: bool Query_log_event::write() { ... return write_header(event_length) || write_data(buf, QUERY_HEADER_LEN) || write_post_header_for_derived() || write_data(start_of_status, (uint) (start-start_of_status)) || write_data(safe_str(db), db_len + 1) || write_data(query, q_len) || write_footer(); } And I am agree that it may bring higher overhead.
There might be other pros and cons to each method, this was just what
occured to me.
Is it not possible to have a "generic compressed event”? Mark mentions allowing selection of the compression type and that seems useful and I would expect the the compressed event to have this information and the compressed content.
The current patch has reserved three bits for the compression method. Thus, it would be possible in the future to add more compression types (up to a maximum of 8, at least).
Ok. good to know.
Actually, it would be good if the code would check the compression type
and
fail if it is unknown. This will help get proper errors on old slaves if new compression types are added in the future.
Indeed, one of the things about the current binlog format is that there’s little complete documentation outside of the code. Code changes and there’s no clear specification. It makes things much better if what’s currently implicit is explicit and also if the specs are outside of the code. That’s something I tried to mention to Oracle with the new X protocol and obviously out of scope of this discussion but the intricacies of the binlog traffic are quite hard to follow for mere mortals…
And the new code would check the compression type.
(2) Senders would try to compress the event as requested. If the compressed event is not any smaller then do not bother compressing it, just send the original event.
This is not in the current patch, but it could be added easily enough. It might not be needed though, the log_bin_compress_min_len kind of serves a similar purpose.
It just saves the receiver doing extra work and I’d expect that allowing that would be trivial even if not implemented now. Just a thought anyway.
As knielsen said, the main purpose of log_bin_compress_min_len is avoiding the too small binlog events.
…
Perhaps I am missing something but the need to compress events individually probably requires quite a bit more logic and will not be as extensible.
Perhaps. In any case, the patch at hand is based on individually compressed events. Even with individual event types, the patch is nice and clean and leaves much of the existing logic unchanged.
My guess is that this will primarily be useful for row-based binlogging, where reasonable compression rates can be expected in many cases where row-based binlogging has particularly high overhead compared to statement-based. Per-event compression is in any case of limited use for events that are small in size.
Sure so workload clearly affects the usefulness of a patch such as this one. That said I’ve seen SBR traffic with very large packets as many bulk changes are massaged to fit to max_allowed_packet, so here even in SBR you can make a huge size saving if you have such statements.
Minimal RBR can help reduce some of the space concerns that RBR generates, and I use it a lot (and requested this functionality) specifically for this reason. However there are use cases where it can not be used (exporting data to external systems that may need/want to have the whole “change log stream”) so clearly compression here may be a good bit of extra functionality.
Fixing the case for RBR is good but I feel the focus may be too narrow, especially if the approach can be used more generically.
I certainly have some SBR machines which generate large volumes of bin logs and to be able to compress the events they generate on disk would be most helpful.
Thanks for sharing your thoughts.
Simon
SBR is also work in this feature.
Hi, On 28 Oct 2016, at 06:18, 陈福荣 <vinchen13@gmail.com<mailto:vinchen13@gmail.com>> wrote: Thanks for knielsen's nice replay. I would like to add some comments …
(2) Senders would try to compress the event as requested. If the compressed event is not any smaller then do not bother compressing it, just send the original event.
This is not in the current patch, but it could be added easily enough. It might not be needed though, the log_bin_compress_min_len kind of serves a similar purpose.
It just saves the receiver doing extra work and I’d expect that allowing that would be trivial even if not implemented now. Just a thought anyway. As knielsen said, the main purpose of log_bin_compress_min_len is avoiding the too small binlog events. Can’t you just find out how big the compressed event is and if it’s not smaller than the original event send the original? That way you know you get some extra compression which if you’re enabling this feature it’s what you want anyway. Having to configure this minimum size means you need to set a specific value. Maybe you can use this new setting with it’s current usage and use -1 to mean only write a compressed event if it’s actually smaller ? … SBR is also work in this feature. That’s good to hear. Simon
Hi knielsen, The new code is here: https://github.com/vinchen/server/commits/GCSAdmin-10.2-binlog-compressed2-2 based on https://github.com/knielsen/server/commits/GCSAdmin-10.2-binlog-compressed2-... And added two fixed: 1.Avoid overflowing buffers in case of corrupt events 2.Check the compressed algorithm. 2016-10-28 21:06 GMT+08:00 Simon Mudd <simon.mudd@booking.com>:
Hi,
On 28 Oct 2016, at 06:18, 陈福荣 <vinchen13@gmail.com> wrote:
Thanks for knielsen's nice replay. I would like to add some comments
…
(2) Senders would try to compress the event as requested. If the
compressed event is not any smaller then do not bother compressing it, just send the original event.
This is not in the current patch, but it could be added easily enough. It might not be needed though, the log_bin_compress_min_len kind of serves a similar purpose.
It just saves the receiver doing extra work and I’d expect that allowing that would be trivial even if not implemented now. Just a thought anyway.
As knielsen said, the main purpose of log_bin_compress_min_len is avoiding the too small binlog events.
Can’t you just find out how big the compressed event is and if it’s not smaller than the original event send the original? That way you know you get some extra compression which if you’re enabling this feature it’s what you want anyway. Having to configure this minimum size means you need to set a specific value. Maybe you can use this new setting with it’s current usage and use -1 to mean only write a compressed event if it’s actually smaller ?
…
SBR is also work in this feature.
That’s good to hear.
Simon
2016-10-21 1:06 GMT+08:00 Kristian Nielsen <knielsen@knielsen-hq.org>:
GCSAdmin <notifications@github.com> writes:
We add new event types to support compress the binlog as follow: QUERY_COMPRESSED_EVENT, WRITE_ROWS_COMPRESSED_EVENT_V1, UPDATE_ROWS_COMPRESSED_EVENT_V1, DELETE_POWS_COMPRESSED_EVENT_V1, WRITE_ROWS_COMPRESSED_EVENT, UPDATE_ROWS_COMPRESSED_EVENT, DELETE_POWS_COMPRESSED_EVENT
Thanks for the patch!
Overall, I'm much impressed with the quality, I see a lot of attention to detail. Below, I have a number of comments and suggestions, but they are all minor for a patch of this complexity.
A number of the comments are style fixes or simple changes, and I found it easier to just do the changes to the source for you to review. I hope that is ok. I rebased the series without intermediate merge to simplify review to a single patch. The rebase with my suggested changes on top is here:
https://github.com/knielsen/server/commits/GCSAdmin-10.2-bin log-compressed2-2
Please check the changes I made and let me know if you disagree with anything or I misunderstood something. Changes are mostly:
- A bunch of code style (indentation, lines <= 80 chars, clarify comments, and so on).
- Change LOG_EVENT_IS_QUERY() etc. from macros to static inline functions.
- check_event_type() updated (new code merged into 10.2 recently).
- Minor .result file update, also from code merged into 10.2 recently.
Thanks for your changes, it seems it's ok. And my new code will base on it.
I also have the following questions/suggestions:
1. I think the code should sanity-check the compressed event header, to check that it does not access outside the event buffer in corrupt events. Eg. if lenlen is 7 and there is less than 7 bytes available in the event. I know that replication code in general is not robust to corrupt events, but it seems best that new code avoids overflowing buffers in case of corrupt event data.
Yes, you are right. I will add the check to avoid overflowing.
2. There are some places where the compressed event types are not added to switch() or if() statements, mainly related to minimal binlog images, I think: Rows_log_event::read_write_bitmaps_cmp(), Rows_log_event::get_data_size(), Rows_log_event::do_apply_event(), Rows_log_event::write_data_body(). I *think* the reason for that is that the SQL thread never sees the compressed events (they are uncompressed by the IO thread), but I would like your confirmation that my understanding is correct.
In SQL thread, the compressed event will uncompressed the “body” in the constructor. So, most of member functions of compressed events can inherit directly, no need to derived.
Such as read_write_bitmaps_cmp(), there is same general_type_code for WRITE_ROWS_EVENT and WRITE_ROWS_COMPRESSED_EVENT. And the body is uncompressed in constructor for compressed events, so there is no need to change. bool read_write_bitmaps_cmp(TABLE *table) { bool res= FALSE; switch (get_general_type_code()) { case DELETE_ROWS_EVENT: res= bitmap_cmp(get_cols(), table->read_set); break; case UPDATE_ROWS_EVENT: res= (bitmap_cmp(get_cols(), table->read_set) && bitmap_cmp(get_cols_ai(), table->rpl_write_set)); break; case WRITE_ROWS_EVENT: res= bitmap_cmp(get_cols(), table->rpl_write_set); break; default: /* We should just compare bitmaps for Delete, Write or Update rows events. */ DBUG_ASSERT(0); } return res; }
3. Did you think about testing that BINLOG statements with compressed events can execute correctly? This happens with mysqlbinlog | mysql, when there are (compressed) row-based events in the binlog. I'm wondering if this could expose compressed events to code that normally runs in the SQL thread and expects to have events uncompressed for it by the IO thread?
As mentioned above, the compressed events would uncompressed in constructor in SQL thread. So, it would also work even if the IO thread do nothing. Of course, it would also work when "mysqlbinlog | mysql" . I also add some test cases for that in mysql-test like "main. mysqlbinlog_row_compressed","main.mysqlbinlog_stmt_compressed"
A few detailed comments on the patch below (most comments are done as changes in the above-linked git branch).
Otherwise, after answer to above 3 questions, I think the patch looks good
to go into 10.2.
- Kristian.
-----------------------------------------------------------------------
+#define BINLOG_COMPRESSED_HEADER_LEN 1 +#define BINLOG_COMPRESSED_ORIGINAL_LENGTH_MAX_BYTES 4 +/** + Compressed Record + Record Header: 1 Byte + 0 Bit: Always 1, mean compressed; + 1-3 Bit: Reversed, compressed algorithm??Always 0, means zlib + 4-7 Bit: Bytes of "Record Original Length" + Record Original Length: 1-4 Bytes + Compressed Buf:
I did not understand this. Why "reversed"? It seems to be bits 0-2 that have the bytes of "Record Original Length" and bit 7 that has the '1' bit meaning compression enabled? Maybe this can be clarified.
sorry, maybe there are some mistake here. There following may be right: Compressed Record Record Header: 1 Byte 7 Bit: Always 1, mean compressed; 4-6 Bit: Reversed, compressed algorithm Now it is always 0, means zlib. It maybe support other compression algorithm in the feture. 0-3 Bit: Bytes of "Record Original Length" Record Original Length: 1-4 Bytes Compressed Buf:
+int query_event_uncompress(const Format_description_log_event *description_event, bool contain_checksum, + const char *src, char* buf, ulong buf_size, bool* is_malloc, + char **dst, ulong *newlen) +{ + ulong len = uint4korr(src + EVENT_LEN_OFFSET); + const char *tmp = src; + + DBUG_ASSERT((uchar)src[EVENT_TYPE_OFFSET] == QUERY_COMPRESSED_EVENT); + + uint8 common_header_len= description_event->common_header_len; + uint8 post_header_len= description_event->post_header _len[QUERY_COMPRESSED_EVENT-1]; + + tmp += common_header_len; + + uint db_len = (uint)tmp[Q_DB_LEN_OFFSET]; + uint16 status_vars_len= uint2korr(tmp + Q_STATUS_VARS_LEN_OFFSET); + + tmp += post_header_len + status_vars_len + db_len + 1; + + uint32 un_len = binlog_get_uncompress_len(tmp); + *newlen = (tmp - src) + un_len;
This is one place I would like to see a check on the data in the event. Maybe just that 'len' is larger than 1+(tmp[0]&7). Or maybe simply that len is at least 10, the minimal size for compressed events.
+int row_log_event_uncompress(const Format_description_log_event *description_event, bool contain_checksum, + const char *src, char* buf, ulong buf_size, bool* is_malloc, + char **dst, ulong *newlen) +{
+ uint32 un_len = binlog_get_uncompress_len(tmp); + *newlen = (tmp - src) + un_len;
Another place where I'd like a check against looking outside the event buffer.
+int binlog_buf_uncompress(const char *src, char *dst, uint32 len, uint32 *newlen) +{ + if((src[0] & 0x80) == 0) + { + return 1; + } + + uint32 lenlen = src[0] & 0x07; + uLongf buflen = *newlen; + if(uncompress((Bytef *)dst, &buflen, (const Bytef*)src + 1 + lenlen, len) != Z_OK)
Again check on length.
Ok, I will add the check.
participants (5)
-
Daniel Black
-
Kristian Nielsen
-
MARK CALLAGHAN
-
Simon Mudd
-
陈福荣