[Maria-developers] Review of base64.diff and german2.diff
Hi! Here is the review for the code that we should put into 10.0 First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
=== modified file 'mysys/base64.c' --- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code. If there is any new code from us, please add a copyright message for the MariaDB foundation too!
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ "abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length() +{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
+ return 0x5EC0D4C77B03531BLL +#else + /* + 1589695686 -> 2147483646 + 1589695687 -> -2147483645 + */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(int length_ }
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length() +{
The following also need a comment.
+#if (SIZEOF_INT == 8) + return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/** + * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here. It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification). Some thing I figured out by asking bar on IRC: - We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
+static inline my_bool +my_base64_decoder_getch(MY_BASE64_DECODER *decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
{ - char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip_spaces(decoder)) + return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+ + /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut> Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
+int +base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&decoder) || + my_base64_decoder_getch(&decoder) || + my_base64_decoder_getch(&decoder) || + my_base64_decoder_getch(&decoder)) + break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
+ /* Return error if there are more non-space characters */ + decoder.state= 0; + if (!my_base64_decoder_skip_spaces(&decoder)) + decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed. <cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_length_and_dec() +{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this. Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()) + { + maybe_null= 1; + fix_char_length_ulonglong((ulonglong) base64_encode_max_arg_length()); + } + else + { + int length= base64_needed_encoded_length((int) args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((ulonglong) length - 1); + } +}
<cut>
+String *Item_func_from_base64::val_str(String *str) +{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((int) res->length())) > + current_thd->variables.max_allowed_packet)) || + tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null. Something like 'Malformed base64 string. Error at position %d" would be nice.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed input + if (too_long) + { + push_warning_printf(current_thd, Sql_condition::WARN_LEVEL_WARN, + ER_WARN_ALLOWED_PACKET_OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET_OVERFLOWED), func_name(), + current_thd->variables.max_allowed_packet); + } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut> Review of german2.diff === modified file 'include/my_global.h' --- include/my_global.h 2013-07-21 14:39:19 +0000 +++ include/my_global.h 2013-08-29 08:27:09 +0000 @@ -1218,4 +1218,11 @@ static inline double rint(double x) #define HAVE_EXTERNAL_CLIENT #endif /* EMBEDDED_LIBRARY */ + +enum loglevel { + ERROR_LEVEL= 0, + WARNING_LEVEL= 1, + INFORMATION_LEVEL= 2 +}; + #endif /* my_global_h */ Not sure why the above, which is a my_sys construct, should be in my_global.h. Please add at least a comment about this. === modified file 'mysys/charset.c' --- mysys/charset.c 2012-08-14 14:23:34 +0000 +++ mysys/charset.c 2013-08-29 12:38:44 +0000 @@ -66,6 +66,9 @@ static my_bool init_state_maps(struct ch if (!(cs->ident_map= ident_map= (uchar*) my_once_alloc(256, MYF(MY_WME)))) return 1; + state_map= (uchar *) cs->state_map; + ident_map= (uchar *) cs->ident_map; Why is the above needed? They are already set one line above. (In addition cs->state_map and cs->ident_map are already uchar *) --- strings/ctype-simple.c 2013-07-21 14:39:19 +0000 +++ strings/ctype-simple.c 2013-08-29 09:05:07 +0000 @@ -1163,12 +1163,12 @@ static int pcmp(const void * f, const vo return res; } -static my_bool create_fromuni(struct charset_info_st *cs, - void *(*alloc)(size_t)) +static my_bool +create_fromuni(struct charset_info_st *cs, + MY_CHARSET_LOADER *loader) { uni_idx idx[PLANE_NUM]; int i,n; - struct my_uni_idx_st *tab_from_uni; /* Check that Unicode map is loaded. @@ -1209,18 +1209,18 @@ static my_bool create_fromuni(struct cha for (i=0; i < PLANE_NUM; i++) { int ch,numchars; - uchar *tab; /* Skip empty plane */ if (!idx[i].nchars) break; numchars=idx[i].uidx.to-idx[i].uidx.from+1; - if (!(idx[i].uidx.tab= tab= (uchar*) - alloc(numchars * sizeof(*idx[i].uidx.tab)))) + if (!(idx[i].uidx.tab= (uchar *) + (loader->once_alloc) (numchars * + sizeof(*idx[i].uidx.tab)))) return TRUE; - bzero(tab,numchars*sizeof(*tab)); + bzero((char*) idx[i].uidx.tab, numchars * sizeof(*idx[i].uidx.tab)); for (ch=1; ch < PLANE_SIZE; ch++) { @@ -1228,32 +1228,32 @@ static my_bool create_fromuni(struct cha if (wc >= idx[i].uidx.from && wc <= idx[i].uidx.to && wc) { int ofs= wc - idx[i].uidx.from; - tab[ofs]= ch; + ((char *) idx[i].uidx.tab)[ofs]= ch; } } } Why remove tab to be a shortcut for idx[i].uidx.from ? Shouldn't it be faster and shorter to use tab than using idx[i].uidx.from? (At least in this place) -static const uint16 page0FCdata[]= { /* FC00 (3 weights per char) */ +static uint16 page0FCdata[]= { /* FC00 (3 weights per char) */ 0x134F,0x135E,0x0000, 0x134F,0x1364,0x0000, 0x134F,0x13B0,0x0000, 0x134F,0x13C7,0x0000, 0x134F,0x13C8,0x0000, 0x1352,0x135E,0x0000, 0x1352,0x1364,0x0000, 0x1352,0x1365,0x0000, 0x1352,0x13B0,0x0000, @@ -6006,7 +6005,7 @@ static const uint16 page0FCdata[]= { /* 0x1381,0x13C8,0x0000, 0x1382,0x13C7,0x0000, 0x1382,0x13C8,0x0000, 0x1364,0x13C7,0x0000 }; If we are not going to change the above, better to keep these as const! They will then be in the const segment and you will have protection from anyone trying to change these! <cut> +my_uca_add_contraction(MY_CONTRACTIONS *list, my_wc_t *wc, size_t len, + my_bool with_context) { - MY_CONTRACTIONS *list= (MY_CONTRACTIONS*) cs->contractions; MY_CONTRACTION *next= &list->item[list->nitems]; - DBUG_ASSERT(len == 2); /* We currently support only contraction2 */ - next->ch[0]= wc[0]; - next->ch[1]= wc[1]; + size_t i; + /* + Contraction is always at least 2 characters. + Contraction is never longer than MY_UCA_MAX_CONTRACTION, + which is guaranteed by using my_coll_rule_expand() with proper limit. + */ + DBUG_ASSERT(len > 1 && len <= MY_UCA_MAX_CONTRACTION); + for (i= 0; i < len; i++) + { + /* + We don't support contractions with U+0000. + my_coll_rule_expand() guarantees there're no U+0000 in a contraction. + */ + DBUG_ASSERT(wc[i] != 0); + next->ch[i]= wc[i]; + } + if (i < MY_UCA_MAX_CONTRACTION) + next->ch[i]= 0; /* Add end-of-line marker */ Wouldn't it make life easier to always have the marker there? (Ie, always have place for the marker). At least you can remove one if from the code. If there is more than one if to remove, then it's a simple optimzation do to... <cut> +static int +my_coll_parser_too_long_error(MY_COLL_RULE_PARSER *p, const char *name) +{ + my_snprintf(p->errstr, sizeof(p->errstr), "%s is too long", name); + return 0; +} You should limit '%s' to 64 characters so that one gets the name in the output even if it's over sizeof()... %-.64s .... <cut> Regards, Monty
Hi Monty, thanks for review. I have addressed most of your suggestions. See the new version attached, and the detailed comments inline: On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
This code is (almost) copy-and-paste from MySQL-5.6. I think it's a good idea when looking inside a test file to be able to see which tests are coming from MySQL-5.6, and which tests are coming from MariaDB-10.0. I'd suggest to keep "Start of 5.6 tests" in this particular case, and also when merging the tests for the other merged MySQL-5.6 features.
=== modified file 'mysys/base64.c' --- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code.
If there is any new code from us, please add a copyright message for the MariaDB foundation too!
Removed 'all rights reserved' and added "MariaDB", as there are some our own changes.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ "abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length() +{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
I added more comments where these numbers come from.
+ return 0x5EC0D4C77B03531BLL +#else + /* + 1589695686 -> 2147483646 + 1589695687 -> -2147483645 + */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(int length_ }
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length() +{
The following also need a comment.
Also added comments.
+#if (SIZEOF_INT == 8) + return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/** + * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here.
Done.
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
I added a reference to http://en.wikipedia.org/wiki/Base64, as well all added these your comments in proper places of the code.
+static inline my_bool +my_base64_decoder_getch(MY_BASE64_DECODER *decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
Removed "inline".
{ - char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip_spaces(decoder)) + return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+ + /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut>
Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
I added this comment: * Note: 'dst' must have sufficient space to store the decoded data. * Use base64_needed_decoded_length() to calculate the correct space size.
+int +base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&decoder) || + my_base64_decoder_getch(&decoder) || + my_base64_decoder_getch(&decoder) || + my_base64_decoder_getch(&decoder)) + break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
Right. my_base64_decoder_getch() is not inline anymore.
+ /* Return error if there are more non-space characters */ + decoder.state= 0; + if (!my_base64_decoder_skip_spaces(&decoder)) + decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed.
Removed.
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_length_and_dec() +{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
This is because base64_encode() and base64_decode() are used in the replication code using "int" as return value. When exposing TO_BASE64() and FROM_BASE64() to the SQL level, I did not want to touch the replication code.
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()) + { + maybe_null= 1; + fix_char_length_ulonglong((ulonglong) base64_encode_max_arg_length()); + } + else + { + int length= base64_needed_encoded_length((int) args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((ulonglong) length - 1); + } +}
<cut>
+String *Item_func_from_base64::val_str(String *str) +{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((int) res->length())) > + current_thd->variables.max_allowed_packet)) || + tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Good idea. I have added a warning. Now it looks clear what went wrong if FROM_BASE64() returns NULL.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed input + if (too_long) + { + push_warning_printf(current_thd, Sql_condition::WARN_LEVEL_WARN, + ER_WARN_ALLOWED_PACKET_OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET_OVERFLOWED), func_name(), + current_thd->variables.max_allowed_packet); + } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut>
<cut>
hi guys! one question, as a mariadb user... base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client? there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387 <https://mariadb.atlassian.net/browse/MDEV-4387> and with time, implement base32 and base16: MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800> base64 is very usefull, and many php users use it to "send/receive" files from database base32 from what i know is usefull only for OTP and other password apps base16 is a hexadecimal function HEX() and UNHEX() thanks guys 2013/9/17 Alexander Barkov <bar@mariadb.org>
Hi Monty,
thanks for review.
I have addressed most of your suggestions. See the new version attached, and the detailed comments inline:
On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test'
--- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
This code is (almost) copy-and-paste from MySQL-5.6. I think it's a good idea when looking inside a test file to be able to see which tests are coming from MySQL-5.6, and which tests are coming from MariaDB-10.0.
I'd suggest to keep "Start of 5.6 tests" in this particular case, and also when merging the tests for the other merged MySQL-5.6 features.
=== modified file 'mysys/base64.c'
--- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code.
If there is any new code from us, please add a copyright message for the MariaDB foundation too!
Removed 'all rights reserved' and added "MariaDB", as there are some our own changes.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ "abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length(**) +{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
I added more comments where these numbers come from.
+ return 0x5EC0D4C77B03531BLL
+#else + /* + 1589695686 -> 2147483646 + 1589695687 -> -2147483645 + */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(**int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(**int length_ }
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length(**) +{
The following also need a comment.
Also added comments.
+#if (SIZEOF_INT == 8)
+ return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/**
+ * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here.
Done.
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
I added a reference to http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64> , as well all added these your comments in proper places of the code.
+static inline my_bool
+my_base64_decoder_getch(MY_**BASE64_DECODER *decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
Removed "inline".
{
- char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip_**spaces(decoder)) + return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+
+ /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut>
Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
I added this comment:
* Note: 'dst' must have sufficient space to store the decoded data. * Use base64_needed_decoded_length() to calculate the correct space size.
+int
+base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&**decoder) || + my_base64_decoder_getch(&**decoder) || + my_base64_decoder_getch(&**decoder) || + my_base64_decoder_getch(&**decoder)) + break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
Right. my_base64_decoder_getch() is not inline anymore.
+ /* Return error if there are more non-space characters */
+ decoder.state= 0; + if (!my_base64_decoder_skip_**spaces(&decoder)) + decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed.
Removed.
<cut>
=== modified file 'sql/item_strfunc.cc'
@@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_**length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_**length_and_dec() +{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(**), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
This is because base64_encode() and base64_decode() are used in the replication code using "int" as return value.
When exposing TO_BASE64() and FROM_BASE64() to the SQL level, I did not want to touch the replication code.
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()**)
+ { + maybe_null= 1; + fix_char_length_ulonglong((**ulonglong) base64_encode_max_arg_length()**); + } + else + { + int length= base64_needed_encoded_length((**int) args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((**ulonglong) length - 1);
+ } +}
<cut>
+String *Item_func_from_base64::val_**str(String *str)
+{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((**int) res->length())) > + current_thd->variables.max_**allowed_packet)) || + tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Good idea. I have added a warning. Now it looks clear what went wrong if FROM_BASE64() returns NULL.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed
input + if (too_long) + { + push_warning_printf(current_**thd, Sql_condition::WARN_LEVEL_** WARN, + ER_WARN_ALLOWED_PACKET_**OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET_**OVERFLOWED), func_name(), + current_thd->variables.max_**allowed_packet); + } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut>
<cut>
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Roberto Spadim SPAEmpresarial
Hello Roberto, On 09/17/2013 06:38 PM, Roberto Spadim wrote:
hi guys! one question, as a mariadb user... base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client? there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387 <https://mariadb.atlassian.net/browse/MDEV-4387>
and with time, implement base32 and base16: MDEV-4800 <https://mariadb.atlassian.net/browse/MDEV-4800>
base64 is very usefull, and many php users use it to "send/receive" files from database base32 from what i know is usefull only for OTP and other password apps base16 is a hexadecimal function HEX() and UNHEX()
Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level, so one case use for example things like this: INSERT INTO t1 VALUES (FROM_BASE64('base64-string')); or SELECT TO_BASE64(column) FROM t1; There are no plans to implement base32.
thanks guys
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org>>
Hi Monty,
thanks for review.
I have addressed most of your suggestions. See the new version attached, and the detailed comments inline:
On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
This code is (almost) copy-and-paste from MySQL-5.6. I think it's a good idea when looking inside a test file to be able to see which tests are coming from MySQL-5.6, and which tests are coming from MariaDB-10.0.
I'd suggest to keep "Start of 5.6 tests" in this particular case, and also when merging the tests for the other merged MySQL-5.6 features.
=== modified file 'mysys/base64.c' --- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code.
If there is any new code from us, please add a copyright message for the MariaDB foundation too!
Removed 'all rights reserved' and added "MariaDB", as there are some our own changes.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ "abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length(__) +{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
I added more comments where these numbers come from.
+ return 0x5EC0D4C77B03531BLL +#else + /* + 1589695686 -> 2147483646 <tel:2147483646> + 1589695687 -> -2147483645 <tel:2147483645> + */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(__int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(__int length_ }
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length(__) +{
The following also need a comment.
Also added comments.
+#if (SIZEOF_INT == 8) + return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/** + * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here.
Done.
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
I added a reference to http://en.wikipedia.org/wiki/__Base64 <http://en.wikipedia.org/wiki/Base64>, as well all added these your comments in proper places of the code.
+static inline my_bool +my_base64_decoder_getch(MY___BASE64_DECODER *decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
Removed "inline".
{ - char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip___spaces(decoder)) + return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+ + /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut>
Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
I added this comment:
* Note: 'dst' must have sufficient space to store the decoded data. * Use base64_needed_decoded_length() to calculate the correct space size.
+int +base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&__decoder) || + my_base64_decoder_getch(&__decoder) || + my_base64_decoder_getch(&__decoder) || + my_base64_decoder_getch(&__decoder)) + break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
Right. my_base64_decoder_getch() is not inline anymore.
+ /* Return error if there are more non-space characters */ + decoder.state= 0; + if (!my_base64_decoder_skip___spaces(&decoder)) + decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed.
Removed.
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix___length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix___length_and_dec() +{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(__), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
This is because base64_encode() and base64_decode() are used in the replication code using "int" as return value.
When exposing TO_BASE64() and FROM_BASE64() to the SQL level, I did not want to touch the replication code.
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()__) + { + maybe_null= 1; + fix_char_length_ulonglong((__ulonglong) base64_encode_max_arg_length()__); + } + else + { + int length= base64_needed_encoded_length((__int) args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((__ulonglong) length - 1); + } +}
<cut>
+String *Item_func_from_base64::val___str(String *str) +{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((__int) res->length())) > + current_thd->variables.max___allowed_packet)) || + tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Good idea. I have added a warning. Now it looks clear what went wrong if FROM_BASE64() returns NULL.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed input + if (too_long) + { + push_warning_printf(current___thd, Sql_condition::WARN_LEVEL___WARN, + ER_WARN_ALLOWED_PACKET___OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET___OVERFLOWED), func_name(), + current_thd->variables.max___allowed_packet); + } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut>
<cut>
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net <mailto:maria-developers@lists.launchpad.net> Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Roberto Spadim SPAEmpresarial
nice =) after this patch commited, could report where i could get it? i will try to rewrite to base32 if possible :) and send at jira thanks! 2013/9/17 Alexander Barkov <bar@mariadb.org>
Hello Roberto,
On 09/17/2013 06:38 PM, Roberto Spadim wrote:
hi guys! one question, as a mariadb user... base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client? there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387 <https://mariadb.atlassian.** net/browse/MDEV-4387 <https://mariadb.atlassian.net/browse/MDEV-4387>>
and with time, implement base32 and base16: MDEV-4800 <https://mariadb.atlassian.**net/browse/MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800>
base64 is very usefull, and many php users use it to "send/receive" files from database base32 from what i know is usefull only for OTP and other password apps base16 is a hexadecimal function HEX() and UNHEX()
Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level, so one case use for example things like this:
INSERT INTO t1 VALUES (FROM_BASE64('base64-string'))**;
or
SELECT TO_BASE64(column) FROM t1;
There are no plans to implement base32.
thanks guys
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org>>
Hi Monty,
thanks for review.
I have addressed most of your suggestions. See the new version attached, and the detailed comments inline:
On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
This code is (almost) copy-and-paste from MySQL-5.6. I think it's a good idea when looking inside a test file to be able to see which tests are coming from MySQL-5.6, and which tests are coming from MariaDB-10.0.
I'd suggest to keep "Start of 5.6 tests" in this particular case, and also when merging the tests for the other merged MySQL-5.6 features.
=== modified file 'mysys/base64.c' --- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code.
If there is any new code from us, please add a copyright message for the MariaDB foundation too!
Removed 'all rights reserved' and added "MariaDB", as there are some our own changes.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ "abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length(**__)
+{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
I added more comments where these numbers come from.
+ return 0x5EC0D4C77B03531BLL +#else + /* + 1589695686 -> 2147483646 <tel:2147483646> + 1589695687 -> -2147483645 <tel:2147483645>
+ */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(_**_int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(_**_int length_
}
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length(**__)
+{
The following also need a comment.
Also added comments.
+#if (SIZEOF_INT == 8) + return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/** + * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here.
Done.
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
I added a reference to http://en.wikipedia.org/wiki/_**_Base64<http://en.wikipedia.org/wiki/__Base64> <http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64>
,
as well all added these your comments in proper places of the code.
+static inline my_bool +my_base64_decoder_getch(MY___**BASE64_DECODER *decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
Removed "inline".
{ - char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip___**spaces(decoder))
+ return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+ + /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut>
Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
I added this comment:
* Note: 'dst' must have sufficient space to store the decoded data. * Use base64_needed_decoded_length() to calculate the correct space size.
+int +base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&__**decoder) || + my_base64_decoder_getch(&__**decoder) || + my_base64_decoder_getch(&__**decoder) || + my_base64_decoder_getch(&__**decoder))
+ break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
Right. my_base64_decoder_getch() is not inline anymore.
+ /* Return error if there are more non-space characters */ + decoder.state= 0; + if (!my_base64_decoder_skip___**spaces(&decoder))
+ decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed.
Removed.
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix___** length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix___**length_and_dec()
+{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(**__), DERIVATION_COERCIBLE,
MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
This is because base64_encode() and base64_decode() are used in the replication code using "int" as return value.
When exposing TO_BASE64() and FROM_BASE64() to the SQL level, I did not want to touch the replication code.
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()**__) + { + maybe_null= 1; + fix_char_length_ulonglong((__**ulonglong) base64_encode_max_arg_length()**__); + } + else + { + int length= base64_needed_encoded_length((**__int)
args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((__**ulonglong) length - 1); + } +}
<cut>
+String *Item_func_from_base64::val___**str(String *str)
+{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((**__int) res->length())) > + current_thd->variables.max___**allowed_packet)) ||
+ tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Good idea. I have added a warning. Now it looks clear what went wrong if FROM_BASE64() returns NULL.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed input + if (too_long) + { + push_warning_printf(current___**thd, Sql_condition::WARN_LEVEL___**WARN, + ER_WARN_ALLOWED_PACKET___** OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET___**OVERFLOWED), func_name(), + current_thd->variables.max___**allowed_packet);
+ } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut>
<cut>
______________________________**_________________ Mailing list: https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers> Post to : maria-developers@lists.**launchpad.net<maria-developers@lists.launchpad.net> <mailto:maria-developers@**lists.launchpad.net<maria-developers@lists.launchpad.net>
Unsubscribe : https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers> More help : https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
Hi Roberto, the patch has been pushed to the 10.0 tree: lp:~maria-captains/maria/10.0 See here for more details about the 10.0 tree. https://launchpad.net/maria/10.0 On 09/17/2013 07:42 PM, Roberto Spadim wrote:
nice =) after this patch commited, could report where i could get it? i will try to rewrite to base32 if possible :) and send at jira thanks!
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org>>
Hello Roberto,
On 09/17/2013 06:38 PM, Roberto Spadim wrote:
hi guys! one question, as a mariadb user... base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client? there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387 <https://mariadb.atlassian.__net/browse/MDEV-4387 <https://mariadb.atlassian.net/browse/MDEV-4387>>
and with time, implement base32 and base16: MDEV-4800 <https://mariadb.atlassian.__net/browse/MDEV-4800 <https://mariadb.atlassian.net/browse/MDEV-4800>>
base64 is very usefull, and many php users use it to "send/receive" files from database base32 from what i know is usefull only for OTP and other password apps base16 is a hexadecimal function HEX() and UNHEX()
Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level, so one case use for example things like this:
INSERT INTO t1 VALUES (FROM_BASE64('base64-string'))__;
or
SELECT TO_BASE64(column) FROM t1;
There are no plans to implement base32.
thanks guys
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org> <mailto:bar@mariadb.org <mailto:bar@mariadb.org>>>
Hi Monty,
thanks for review.
I have addressed most of your suggestions. See the new version attached, and the detailed comments inline:
On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
This code is (almost) copy-and-paste from MySQL-5.6. I think it's a good idea when looking inside a test file to be able to see which tests are coming from MySQL-5.6, and which tests are coming from MariaDB-10.0.
I'd suggest to keep "Start of 5.6 tests" in this particular case, and also when merging the tests for the other merged MySQL-5.6 features.
=== modified file 'mysys/base64.c' --- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code.
If there is any new code from us, please add a copyright message for the MariaDB foundation too!
Removed 'all rights reserved' and added "MariaDB", as there are some our own changes.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ
"abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length(____)
+{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
I added more comments where these numbers come from.
+ return 0x5EC0D4C77B03531BLL +#else + /* + 1589695686 -> 2147483646 <tel:2147483646> <tel:2147483646 <tel:2147483646>> + 1589695687 -> -2147483645 <tel:2147483645> <tel:2147483645 <tel:2147483645>>
+ */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(____int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(____int length_
}
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length(____)
+{
The following also need a comment.
Also added comments.
+#if (SIZEOF_INT == 8) + return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/** + * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here.
Done.
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
I added a reference to http://en.wikipedia.org/wiki/____Base64 <http://en.wikipedia.org/wiki/__Base64> <http://en.wikipedia.org/wiki/__Base64 <http://en.wikipedia.org/wiki/Base64>>,
as well all added these your comments in proper places of the code.
+static inline my_bool +my_base64_decoder_getch(MY_____BASE64_DECODER *decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
Removed "inline".
{ - char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip_____spaces(decoder))
+ return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+ + /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut>
Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
I added this comment:
* Note: 'dst' must have sufficient space to store the decoded data. * Use base64_needed_decoded_length() to calculate the correct space size.
+int +base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&____decoder) || + my_base64_decoder_getch(&____decoder) || + my_base64_decoder_getch(&____decoder) || + my_base64_decoder_getch(&____decoder))
+ break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
Right. my_base64_decoder_getch() is not inline anymore.
+ /* Return error if there are more non-space characters */ + decoder.state= 0; + if (!my_base64_decoder_skip_____spaces(&decoder))
+ decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed.
Removed.
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_____length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_____length_and_dec()
+{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(____), DERIVATION_COERCIBLE,
MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
This is because base64_encode() and base64_decode() are used in the replication code using "int" as return value.
When exposing TO_BASE64() and FROM_BASE64() to the SQL level, I did not want to touch the replication code.
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()____) + { + maybe_null= 1; + fix_char_length_ulonglong((____ulonglong) base64_encode_max_arg_length()____); + } + else + { + int length= base64_needed_encoded_length((____int)
args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((____ulonglong) length - 1); + } +}
<cut>
+String *Item_func_from_base64::val_____str(String *str)
+{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((____int) res->length())) > + current_thd->variables.max_____allowed_packet)) ||
+ tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Good idea. I have added a warning. Now it looks clear what went wrong if FROM_BASE64() returns NULL.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed input + if (too_long) + { + push_warning_printf(current_____thd, Sql_condition::WARN_LEVEL_____WARN, + ER_WARN_ALLOWED_PACKET_____OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET_____OVERFLOWED), func_name(), + current_thd->variables.max_____allowed_packet);
+ } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut>
<cut>
_________________________________________________ Mailing list: https://launchpad.net/~maria-__developers <https://launchpad.net/~maria-developers> Post to : maria-developers@lists.__launchpad.net <mailto:maria-developers@lists.launchpad.net> <mailto:maria-developers@__lists.launchpad.net <mailto:maria-developers@lists.launchpad.net>>
Unsubscribe : https://launchpad.net/~maria-__developers <https://launchpad.net/~maria-developers> More help : https://help.launchpad.net/__ListHelp <https://help.launchpad.net/ListHelp>
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
thanks alexander! :D i will read and test it as soon as possible :) 2013/9/23 Alexander Barkov <bar@mariadb.org>
Hi Roberto,
the patch has been pushed to the 10.0 tree:
lp:~maria-captains/maria/10.0
See here for more details about the 10.0 tree. https://launchpad.net/maria/**10.0 <https://launchpad.net/maria/10.0>
On 09/17/2013 07:42 PM, Roberto Spadim wrote:
nice =) after this patch commited, could report where i could get it? i will try to rewrite to base32 if possible :) and send at jira thanks!
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org>>
Hello Roberto,
On 09/17/2013 06:38 PM, Roberto Spadim wrote:
hi guys! one question, as a mariadb user... base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client? there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387 <https://mariadb.atlassian.__**net/browse/MDEV-4387
<https://mariadb.atlassian.**net/browse/MDEV-4387<https://mariadb.atlassian.net/browse/MDEV-4387>
and with time, implement base32 and base16: MDEV-4800 <https://mariadb.atlassian.__**net/browse/MDEV-4800
<https://mariadb.atlassian.**net/browse/MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800>
base64 is very usefull, and many php users use it to "send/receive" files from database base32 from what i know is usefull only for OTP and other password apps base16 is a hexadecimal function HEX() and UNHEX()
Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level, so one case use for example things like this:
INSERT INTO t1 VALUES (FROM_BASE64('base64-string'))**__;
or
SELECT TO_BASE64(column) FROM t1;
There are no plans to implement base32.
thanks guys
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org> <mailto:bar@mariadb.org
<mailto:bar@mariadb.org>>>
Hi Monty,
thanks for review.
I have addressed most of your suggestions. See the new version attached, and the detailed comments inline:
On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
This code is (almost) copy-and-paste from MySQL-5.6. I think it's a good idea when looking inside a test file to be able to see which tests are coming from MySQL-5.6, and which tests are coming from MariaDB-10.0.
I'd suggest to keep "Start of 5.6 tests" in this particular case, and also when merging the tests for the other merged MySQL-5.6 features.
=== modified file 'mysys/base64.c' --- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code.
If there is any new code from us, please add a copyright message for the MariaDB foundation too!
Removed 'all rights reserved' and added "MariaDB", as there are some our own changes.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ
"abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length(**____)
+{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
I added more comments where these numbers come from.
+ return 0x5EC0D4C77B03531BLL +#else + /* + 1589695686 -> 2147483646 <tel:2147483646> <tel:2147483646 <tel:2147483646>>
+ 1589695687 -> -2147483645 <tel:2147483645> <tel:2147483645 <tel:2147483645>>
+ */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(_**___int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(_**___int length_
}
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length(**____)
+{
The following also need a comment.
Also added comments.
+#if (SIZEOF_INT == 8) + return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/** + * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here.
Done.
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
I added a reference to http://en.wikipedia.org/wiki/_**___Base64<http://en.wikipedia.org/wiki/____Base64> <http://en.wikipedia.org/wiki/**__Base64<http://en.wikipedia.org/wiki/__Base64>
<http://en.wikipedia.org/wiki/**__Base64<http://en.wikipedia.org/wiki/__Base64> <http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64>
,
as well all added these your comments in proper places of the code.
+static inline my_bool +my_base64_decoder_getch(MY___**__BASE64_DECODER
*decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
Removed "inline".
{ - char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip_____**spaces(decoder))
+ return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+ + /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut>
Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
I added this comment:
* Note: 'dst' must have sufficient space to store the decoded data. * Use base64_needed_decoded_length() to calculate the correct space size.
+int +base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder))
+ break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
Right. my_base64_decoder_getch() is not inline anymore.
+ /* Return error if there are more non-space characters */ + decoder.state= 0; + if (!my_base64_decoder_skip_____** spaces(&decoder))
+ decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed.
Removed.
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix____**_length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_____** length_and_dec()
+{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(**____),
DERIVATION_COERCIBLE,
MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
This is because base64_encode() and base64_decode() are used in the replication code using "int" as return value.
When exposing TO_BASE64() and FROM_BASE64() to the SQL level, I did not want to touch the replication code.
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()**____) + { + maybe_null= 1; + fix_char_length_ulonglong((___**_ulonglong) base64_encode_max_arg_length()**____); + } + else + { + int length= base64_needed_encoded_length((** ____int)
args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((___**_ulonglong)
length - 1); + } +}
<cut>
+String *Item_func_from_base64::val___**__str(String
*str)
+{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((**____int) res->length())) > + current_thd->variables.max____**_allowed_packet)) ||
+ tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Good idea. I have added a warning. Now it looks clear what went wrong if FROM_BASE64() returns NULL.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed input + if (too_long) + { + push_warning_printf(current___**__thd, Sql_condition::WARN_LEVEL_____**WARN, + ER_WARN_ALLOWED_PACKET_____**OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET_____**OVERFLOWED), func_name(), + current_thd->variables.max____**_allowed_packet);
+ } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut>
<cut>
______________________________**___________________ Mailing list: https://launchpad.net/~maria-_**_developers<https://launchpad.net/~maria-__developers> <https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
Post to : maria-developers@lists.__launc**hpad.net<http://launchpad.net> <mailto:maria-developers@**lists.launchpad.net<maria-developers@lists.launchpad.net>
<mailto:maria-developers@__lis**ts.launchpad.net<http://lists.launchpad.net> <mailto:maria-developers@**lists.launchpad.net<maria-developers@lists.launchpad.net>
Unsubscribe : https://launchpad.net/~maria-_**_developers<https://launchpad.net/~maria-__developers> <https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
More help : https://help.launchpad.net/__**ListHelp<https://help.launchpad.net/__ListHelp>
<https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
Just one information for everybody it's a veeeeery verrrrry old bug at mysql reported by me :) http://bugs.mysql.com/bug.php?id=18861 Add base64_encode, base64_decode functions 6 Apr 2006 18:00 thanks mariadb team! =) 7 years old bug resolved =) 2013/9/23 Roberto Spadim <roberto@spadim.com.br>
thanks alexander! :D i will read and test it as soon as possible :)
2013/9/23 Alexander Barkov <bar@mariadb.org>
Hi Roberto,
the patch has been pushed to the 10.0 tree:
lp:~maria-captains/maria/10.0
See here for more details about the 10.0 tree. https://launchpad.net/maria/**10.0 <https://launchpad.net/maria/10.0>
On 09/17/2013 07:42 PM, Roberto Spadim wrote:
nice =) after this patch commited, could report where i could get it? i will try to rewrite to base32 if possible :) and send at jira thanks!
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org>>
Hello Roberto,
On 09/17/2013 06:38 PM, Roberto Spadim wrote:
hi guys! one question, as a mariadb user... base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client? there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387 <https://mariadb.atlassian.__**net/browse/MDEV-4387
<https://mariadb.atlassian.**net/browse/MDEV-4387<https://mariadb.atlassian.net/browse/MDEV-4387>
and with time, implement base32 and base16: MDEV-4800 <https://mariadb.atlassian.__**net/browse/MDEV-4800
<https://mariadb.atlassian.**net/browse/MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800>
base64 is very usefull, and many php users use it to "send/receive" files from database base32 from what i know is usefull only for OTP and other password apps base16 is a hexadecimal function HEX() and UNHEX()
Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level, so one case use for example things like this:
INSERT INTO t1 VALUES (FROM_BASE64('base64-string'))**__;
or
SELECT TO_BASE64(column) FROM t1;
There are no plans to implement base32.
thanks guys
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org> <mailto:bar@mariadb.org
<mailto:bar@mariadb.org>>>
Hi Monty,
thanks for review.
I have addressed most of your suggestions. See the new version attached, and the detailed comments inline:
On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
This code is (almost) copy-and-paste from MySQL-5.6. I think it's a good idea when looking inside a test file to be able to see which tests are coming from MySQL-5.6, and which tests are coming from MariaDB-10.0.
I'd suggest to keep "Start of 5.6 tests" in this particular case, and also when merging the tests for the other merged MySQL-5.6 features.
=== modified file 'mysys/base64.c' --- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code.
If there is any new code from us, please add a copyright message for the MariaDB foundation too!
Removed 'all rights reserved' and added "MariaDB", as there are some our own changes.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ
"abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length(**____)
+{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
I added more comments where these numbers come from.
+ return 0x5EC0D4C77B03531BLL +#else + /* + 1589695686 -> 2147483646 <tel:2147483646> <tel:2147483646 <tel:2147483646>>
+ 1589695687 -> -2147483645 <tel:2147483645> <tel:2147483645 <tel:2147483645>>
+ */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(_**___int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(_**___int length_
}
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length(**____)
+{
The following also need a comment.
Also added comments.
+#if (SIZEOF_INT == 8) + return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/** + * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here.
Done.
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
I added a reference to http://en.wikipedia.org/wiki/_**___Base64<http://en.wikipedia.org/wiki/____Base64> <http://en.wikipedia.org/wiki/**__Base64<http://en.wikipedia.org/wiki/__Base64>
<http://en.wikipedia.org/wiki/**__Base64<http://en.wikipedia.org/wiki/__Base64> <http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64>
,
as well all added these your comments in proper places of the code.
+static inline my_bool +my_base64_decoder_getch(MY___**__BASE64_DECODER
*decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
Removed "inline".
{ - char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip_____** spaces(decoder))
+ return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+ + /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut>
Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
I added this comment:
* Note: 'dst' must have sufficient space to store the decoded data. * Use base64_needed_decoded_length() to calculate the correct space size.
+int +base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder))
+ break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
Right. my_base64_decoder_getch() is not inline anymore.
+ /* Return error if there are more non-space characters */ + decoder.state= 0; + if (!my_base64_decoder_skip_____** spaces(&decoder))
+ decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed.
Removed.
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix____**_length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_____** length_and_dec()
+{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(**____),
DERIVATION_COERCIBLE,
MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
This is because base64_encode() and base64_decode() are used in the replication code using "int" as return value.
When exposing TO_BASE64() and FROM_BASE64() to the SQL level, I did not want to touch the replication code.
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()**____) + { + maybe_null= 1; + fix_char_length_ulonglong((___**_ulonglong) base64_encode_max_arg_length()**____); + } + else + { + int length= base64_needed_encoded_length((** ____int)
args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((___**_ulonglong)
length - 1); + } +}
<cut>
+String *Item_func_from_base64::val___** __str(String
*str)
+{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((**____int) res->length())) > + current_thd->variables.max____**_allowed_packet)) ||
+ tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Good idea. I have added a warning. Now it looks clear what went wrong if FROM_BASE64() returns NULL.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed input + if (too_long) + { + push_warning_printf(current___**__thd, Sql_condition::WARN_LEVEL_____**WARN, + ER_WARN_ALLOWED_PACKET_____**OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET_____**OVERFLOWED), func_name(), + current_thd->variables.max____**_allowed_packet);
+ } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut>
<cut>
______________________________**___________________ Mailing list: https://launchpad.net/~maria-_**_developers<https://launchpad.net/~maria-__developers> <https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
Post to : maria-developers@lists.__launc**hpad.net<http://launchpad.net> <mailto:maria-developers@**lists.launchpad.net<maria-developers@lists.launchpad.net>
<mailto:maria-developers@__lis**ts.launchpad.net<http://lists.launchpad.net> <mailto:maria-developers@**lists.launchpad.net<maria-developers@lists.launchpad.net>
Unsubscribe : https://launchpad.net/~maria-_**_developers<https://launchpad.net/~maria-__developers> <https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
More help : https://help.launchpad.net/__**ListHelp<https://help.launchpad.net/__ListHelp>
<https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
In http://www.ijcit.com/archives/volume2/issue5/Paper020519.pdf For some reason MySQL/MariaDB is not mentioned. R: -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Guys, i think it's important to explain that we use RFC 2045 maybe at DOCS and maybe at SOURCE too just to make things more clear, php users and others users that only know about base64_encode/decode, don't know about many RFCs for each case... 2013/9/23 Roberto Spadim <roberto@spadim.com.br>
Just one information for everybody it's a veeeeery verrrrry old bug at mysql reported by me :)
http://bugs.mysql.com/bug.php?id=18861 Add base64_encode, base64_decode functions 6 Apr 2006 18:00
thanks mariadb team! =) 7 years old bug resolved =)
2013/9/23 Roberto Spadim <roberto@spadim.com.br>
thanks alexander! :D i will read and test it as soon as possible :)
2013/9/23 Alexander Barkov <bar@mariadb.org>
Hi Roberto,
the patch has been pushed to the 10.0 tree:
lp:~maria-captains/maria/10.0
See here for more details about the 10.0 tree. https://launchpad.net/maria/**10.0 <https://launchpad.net/maria/10.0>
On 09/17/2013 07:42 PM, Roberto Spadim wrote:
nice =) after this patch commited, could report where i could get it? i will try to rewrite to base32 if possible :) and send at jira thanks!
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org>>
Hello Roberto,
On 09/17/2013 06:38 PM, Roberto Spadim wrote:
hi guys! one question, as a mariadb user... base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client? there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387 <https://mariadb.atlassian.__**net/browse/MDEV-4387
<https://mariadb.atlassian.**net/browse/MDEV-4387<https://mariadb.atlassian.net/browse/MDEV-4387>
and with time, implement base32 and base16: MDEV-4800 <https://mariadb.atlassian.__**net/browse/MDEV-4800
<https://mariadb.atlassian.**net/browse/MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800>
base64 is very usefull, and many php users use it to "send/receive" files from database base32 from what i know is usefull only for OTP and other password apps base16 is a hexadecimal function HEX() and UNHEX()
Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level, so one case use for example things like this:
INSERT INTO t1 VALUES (FROM_BASE64('base64-string'))**__;
or
SELECT TO_BASE64(column) FROM t1;
There are no plans to implement base32.
thanks guys
2013/9/17 Alexander Barkov <bar@mariadb.org <mailto:bar@mariadb.org> <mailto:bar@mariadb.org
<mailto:bar@mariadb.org>>>
Hi Monty,
thanks for review.
I have addressed most of your suggestions. See the new version attached, and the detailed comments inline:
On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
This code is (almost) copy-and-paste from MySQL-5.6. I think it's a good idea when looking inside a test file to be able to see which tests are coming from MySQL-5.6, and which tests are coming from MariaDB-10.0.
I'd suggest to keep "Start of 5.6 tests" in this particular case, and also when merging the tests for the other merged MySQL-5.6 features.
=== modified file 'mysys/base64.c' --- mysys/base64.c 2011-06-30 15:46:53 +0000 +++ mysys/base64.c 2013-03-09 06:22:59 +0000 @@ -1,5 +1,4 @@ -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc. - Use is subject to license terms. +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
Removed 'all rights reserved'. You can't have that for GPL code.
If there is any new code from us, please add a copyright message for the MariaDB foundation too!
Removed 'all rights reserved' and added "MariaDB", as there are some our own changes.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ
"abcdefghijklmnopqrstuvwxyz" "0123456789+/";
+/** + * Maximum length base64_needed_encoded_length() + * can handle without overflow. + */ +int +base64_encode_max_arg_length(**____)
+{ +#if (SIZEOF_INT == 8) + /* + 6827690988321067803 -> 9223372036854775805 + 6827690988321067804 -> -9223372036854775807 + */
Please describe from where the following number comes from (so that anyone can recalculate and check the number if needed):
I added more comments where these numbers come from.
+ return 0x5EC0D4C77B03531BLL +#else + /* + 1589695686 -> 2147483646 <tel:2147483646> <tel:2147483646 <tel:2147483646>>
+ 1589695687 -> -2147483645 <tel:2147483645> <tel:2147483645 <tel:2147483645>>
+ */ + return 0x5EC0D4C6; +#endif +} +
int base64_needed_encoded_length(_**___int length_of_data) @@ -39,6 +60,21 @@ base64_needed_encoded_length(_**___int length_
}
+/** + * Maximum length base64_needed_decoded_length() + * can handle without overflow. + */ +int +base64_decode_max_arg_length(**____)
+{
The following also need a comment.
Also added comments.
+#if (SIZEOF_INT == 8) + return 0x2AAAAAAAAAAAAAAALL; +#else + return 0x2AAAAAAA; +#endif +} + +
<cut>
+/** + * Get the next character from a base64 encoded stream. + * Skip spaces, then scan the next base64 character or a pad character + * and collect bits into decoder->c. + * + * @param decoder Pointer to MY_BASE64_DECODER + * @return + * FALSE on success (a valid base64 encoding character found) + * TRUE on error (unexpected character or unexpected end-of-input found) + */
Add an empty line here.
Done.
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
I added a reference to http://en.wikipedia.org/wiki/_**___Base64<http://en.wikipedia.org/wiki/____Base64> <http://en.wikipedia.org/wiki/**__Base64<http://en.wikipedia.org/wiki/__Base64>
<http://en.wikipedia.org/wiki/**__Base64<http://en.wikipedia.org/wiki/__Base64> <http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64>
,
as well all added these your comments in proper places of the code.
+static inline my_bool +my_base64_decoder_getch(MY___**__BASE64_DECODER
*decoder)
Remove inline from the above; It's likely to make the code slower, not faster as this is a big function with lot of if's.
Removed "inline".
{ - char b[3]; - size_t i= 0; - char *dst_base= (char *)dst; - char const *src= src_base; - char *d= dst_base; - size_t j; + if (my_base64_decoder_skip_____** spaces(decoder))
+ return TRUE; /* End-of-input */
- while (i < len) + if (!my_base64_add(decoder)) /* Valid base64 character found */ { - unsigned c= 0; - size_t mark= 0; + if (decoder->mark) + { + /* If we have scanned '=' already, then only '=' is valid */ + DBUG_ASSERT(decoder->state == 3); + decoder->error= 1; + decoder->src--; + return TRUE; /* expected '=', but encoding character found */ + } + decoder->state++; + return FALSE; + }
If you want to have the funtion inline, then move the following to another not inline function; We don't need to generate code for this for every function call:
+ + /* Process error */ + switch (decoder->state) + { + case 0: + case 1: + decoder->src--; + return TRUE; /* base64 character expected */ + break; + + case 2: + case 3: + if (decoder->src[-1] == '=') + { + decoder->error= 0; /* Not an error - it's a pad character */ + decoder->mark++; + } + else + { + decoder->src--; + return TRUE; /* base64 character or '=' expected */ + } + break; + default: + DBUG_ASSERT(0); + return TRUE; /* Wrong state, should not happen */ + }
<cut>
Add to the comment for base64_decode that the 'dst' buffer has to be at least 2 character longer than what is needed to decode src_base.
I added this comment:
* Note: 'dst' must have sufficient space to store the decoded data. * Use base64_needed_decoded_length() to calculate the correct space size.
+int +base64_decode(const char *src_base, size_t len, + void *dst, const char **end_ptr, int flags) +{ + if (my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder) || + my_base64_decoder_getch(&____**decoder))
+ break;
Generating the error handling with a switch for every of the above is wasteful. See previous comment.
Right. my_base64_decoder_getch() is not inline anymore.
+ /* Return error if there are more non-space characters */ + decoder.state= 0; + if (!my_base64_decoder_skip_____** spaces(&decoder))
+ decoder.error= 1; + if (end_ptr != NULL)
Note that base64_needed_decoded_length() used ceil() when it's not needed.
Removed.
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix____**_length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_____** length_and_dec()
+{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(**____),
DERIVATION_COERCIBLE,
MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
This is because base64_encode() and base64_decode() are used in the replication code using "int" as return value.
When exposing TO_BASE64() and FROM_BASE64() to the SQL level, I did not want to touch the replication code.
+ if (args[0]->max_length > (uint) base64_encode_max_arg_length()**____) + { + maybe_null= 1; + fix_char_length_ulonglong((___**_ulonglong) base64_encode_max_arg_length()**____); + } + else + { + int length= base64_needed_encoded_length((** ____int)
args[0]->max_length); + DBUG_ASSERT(length > 0);
Don't think assert is needed as the function gurantees it already.
+ fix_char_length_ulonglong((___**_ulonglong)
length - 1); + } +}
<cut>
+String *Item_func_from_base64::val___** __str(String
*str)
+{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((**____int) res->length())) > + current_thd->variables.max____**_allowed_packet)) ||
+ tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Good idea. I have added a warning. Now it looks clear what went wrong if FROM_BASE64() returns NULL.
+ null_value= 1; // NULL input, too long input, OOM, or badly formed input + if (too_long) + { + push_warning_printf(current___**__thd, Sql_condition::WARN_LEVEL_____**WARN, + ER_WARN_ALLOWED_PACKET_____**OVERFLOWED, + ER(ER_WARN_ALLOWED_PACKET_____**OVERFLOWED), func_name(), + current_thd->variables.max____** _allowed_packet);
+ } + return 0; + } + tmp_value.length((uint) length); + null_value= 0; + return &tmp_value; +}
<cut>
<cut>
______________________________**___________________ Mailing list: https://launchpad.net/~maria-_**_developers<https://launchpad.net/~maria-__developers> <https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
Post to : maria-developers@lists.__launc**hpad.net<http://launchpad.net> <mailto:maria-developers@**lists.launchpad.net<maria-developers@lists.launchpad.net>
<mailto:maria-developers@__lis**ts.launchpad.net<http://lists.launchpad.net> <mailto:maria-developers@**lists.launchpad.net<maria-developers@lists.launchpad.net>
Unsubscribe : https://launchpad.net/~maria-_**_developers<https://launchpad.net/~maria-__developers> <https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
More help : https://help.launchpad.net/__**ListHelp<https://help.launchpad.net/__ListHelp>
<https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
-- Roberto Spadim SPAEmpresarial
Sorry if i'm posting too much, it's my last post for this ... intbase64_decode_max_arg_length(){#if (SIZEOF_INT == 8) return 0x7FFFFFFFFFFFFFFFLL;#else return 0x7FFFFFFF;#endif} if we reeturn LL (longlong) with a int function, this will cause overflow/warnings at gcc? will the long long be returned with the right value? intmain(void){ base64.c is a binary program too? could we compile it as mariadb_base64 ? this help debuging something like ENABLE-BASE64-BINARY, or some README file showing how to do this well :) that's all :) at weekend i report if i found something strange or bugs :) bye
On 09/23/2013 09:25 PM, Roberto Spadim wrote:
Sorry if i'm posting too much, it's my last post for this ...
int base64_decode_max_arg_length() { #if (SIZEOF_INT == 8) return 0x7FFFFFFFFFFFFFFFLL; #else return 0x7FFFFFFF; #endif }
Oops. Some remainders after moving from "size_t" to "int" (to avoid changes in the replication code which relies on "int"). SIZEOF_INT can never actually be 8. So perhaps #if is not really needed.
if we reeturn LL (longlong) with a int function, this will cause overflow/warnings at gcc? will the long long be returned with the right value?
int main(void) {
base64.c is a binary program too? could we compile it as mariadb_base64 ? this help debuging something like ENABLE-BASE64-BINARY, or some README file showing how to do this
It is not a binary program. I guess the original author of this file used this main() for test purposes. main() could probably be removed.
well :) that's all :) at weekend i report if i found something strange or bugs :) bye
Hi!
"Alexander" == Alexander Barkov <bar@mariadb.org> writes:
Alexander> Hi Monty, Alexander> thanks for review. Alexander> I have addressed most of your suggestions. See the new version attached, Alexander> and the detailed comments inline: Alexander> On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
Alexander> This code is (almost) copy-and-paste from MySQL-5.6. Alexander> I think it's a good idea when looking inside a test file Alexander> to be able to see which tests are coming from MySQL-5.6, Alexander> and which tests are coming from MariaDB-10.0. Alexander> I'd suggest to keep "Start of 5.6 tests" in this particular case, Alexander> and also when merging the tests for the other merged MySQL-5.6 features. As long a this is an exact copy paste it's ok. <cut>
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
Alexander> I added a reference to http://en.wikipedia.org/wiki/Base64, Alexander> as well all added these your comments in proper places of the code. Thanks. That will make the code much easier to understand. I just read trough the definition and noticed that some versions doesn't use '=' padding. Should we allow not '=' padding in our decoder too? (I think it's best to always pad on encoding). <cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_length_and_dec() +{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
Alexander> This is because base64_encode() and base64_decode() are used Alexander> in the replication code using "int" as return value. Alexander> When exposing TO_BASE64() and FROM_BASE64() to the SQL level, Alexander> I did not want to touch the replication code. ok. We should however at some case fix that. <cut>
+String *Item_func_from_base64::val_str(String *str) +{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((int) res->length())) > + current_thd->variables.max_allowed_packet)) || + tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Alexander> Good idea. I have added a warning. Now it looks clear what went wrong Alexander> if FROM_BASE64() returns NULL. Thanks! Regards, Monty
Hi Monty, On 09/17/2013 08:12 PM, Michael Widenius wrote:
Hi!
"Alexander" == Alexander Barkov <bar@mariadb.org> writes:
Alexander> Hi Monty, Alexander> thanks for review.
Alexander> I have addressed most of your suggestions. See the new version attached, Alexander> and the detailed comments inline:
Alexander> On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
Alexander> This code is (almost) copy-and-paste from MySQL-5.6. Alexander> I think it's a good idea when looking inside a test file Alexander> to be able to see which tests are coming from MySQL-5.6, Alexander> and which tests are coming from MariaDB-10.0.
Alexander> I'd suggest to keep "Start of 5.6 tests" in this particular case, Alexander> and also when merging the tests for the other merged MySQL-5.6 features.
As long a this is an exact copy paste it's ok.
<cut>
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
Alexander> I added a reference to http://en.wikipedia.org/wiki/Base64, Alexander> as well all added these your comments in proper places of the code.
Thanks. That will make the code much easier to understand.
I just read trough the definition and noticed that some versions doesn't use '=' padding.
Should we allow not '=' padding in our decoder too? (I think it's best to always pad on encoding).
http://dev.mysql.com/doc/refman/5.6/en/string-functions.html#function_to-bas... says:
Different base-64 encoding schemes exist. These are the encoding and decoding rules used by TO_BASE64() and FROM_BASE64(): ... * Encoded output consists of groups of 4 printable characters. Each 3 bytes of the input data are encoded using 4 characters. If the last group is incomplete, it is padded with '=' characters to a length of 4.
So we always pad on encoding. So do most of the modern pieces of software. So does PHP: http://php.net/manual/en/function.base64-encode.php So does Linux "bas64" program: $ echo "123" |base64 MTIzCg== This is why we always require pad characters on decoding. I don't think we should accept not-padded values.
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_length_and_dec() +{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
Alexander> This is because base64_encode() and base64_decode() are used Alexander> in the replication code using "int" as return value.
Alexander> When exposing TO_BASE64() and FROM_BASE64() to the SQL level, Alexander> I did not want to touch the replication code.
ok. We should however at some case fix that.
<cut>
+String *Item_func_from_base64::val_str(String *str) +{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((int) res->length())) > + current_thd->variables.max_allowed_packet)) || + tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Alexander> Good idea. I have added a warning. Now it looks clear what went wrong Alexander> if FROM_BASE64() returns NULL.
Thanks!
Regards, Monty
On 09/17/2013 08:52 PM, Alexander Barkov wrote:
Hi Monty,
On 09/17/2013 08:12 PM, Michael Widenius wrote:
Hi!
> "Alexander" == Alexander Barkov <bar@mariadb.org> writes:
Alexander> Hi Monty, Alexander> thanks for review.
Alexander> I have addressed most of your suggestions. See the new version attached, Alexander> and the detailed comments inline:
Alexander> On 09/12/2013 04:32 PM, Michael Widenius wrote:
Hi!
Here is the review for the code that we should put into 10.0
First the base64:
=== modified file 'mysql-test/t/func_str.test' --- mysql-test/t/func_str.test 2013-05-07 11:05:09 +0000 +++ mysql-test/t/func_str.test 2013-08-28 13:14:24 +0000 @@ -1555,3 +1555,118 @@ drop table t1,t2; --echo # End of 5.5 tests --echo #
+ +--echo # +--echo # Start of 5.6 tests +--echo #
Shouldn't this be start of 10.0 tests ? (I know that this code is from MySQL 5.6, but still for us this is 10.0...)
Alexander> This code is (almost) copy-and-paste from MySQL-5.6. Alexander> I think it's a good idea when looking inside a test file Alexander> to be able to see which tests are coming from MySQL-5.6, Alexander> and which tests are coming from MariaDB-10.0.
Alexander> I'd suggest to keep "Start of 5.6 tests" in this particular case, Alexander> and also when merging the tests for the other merged MySQL-5.6 features.
As long a this is an exact copy paste it's ok.
<cut>
It would also be good to have a good description of the base64 format in this file so that one can understand what the functions are supposed to do (or at least a link to the specification).
Some thing I figured out by asking bar on IRC:
- We never generate spaces - When decoding, we allow spaces anywhere. - The end of an 64 base decoded string is always padded with '=' so that the final length is dividable with 4.
Alexander> I added a reference to http://en.wikipedia.org/wiki/Base64, Alexander> as well all added these your comments in proper places of the code.
Thanks. That will make the code much easier to understand.
I just read trough the definition and noticed that some versions doesn't use '=' padding.
Should we allow not '=' padding in our decoder too? (I think it's best to always pad on encoding).
http://dev.mysql.com/doc/refman/5.6/en/string-functions.html#function_to-bas...
says:
Different base-64 encoding schemes exist. These are the encoding and decoding rules used by TO_BASE64() and FROM_BASE64(): ... * Encoded output consists of groups of 4 printable characters. Each 3 bytes of the input data are encoded using 4 characters. If the last group is incomplete, it is padded with '=' characters to a length of 4.
So we always pad on encoding.
So do most of the modern pieces of software.
So does PHP: http://php.net/manual/en/function.base64-encode.php
So does Linux "bas64" program:
$ echo "123" |base64 MTIzCg==
This is why we always require pad characters on decoding. I don't think we should accept not-padded values.
Just checked PostgreSQL. They also pad on encode, and do not accept non-properly padded values on decode: bar=> select decode('MQ==','base64'); decode -------- \x31 (1 row) bar=> select decode('MQ=','base64'); ERROR: invalid end sequence bar=> select decode('MQ','base64'); ERROR: invalid end sequence
<cut>
=== modified file 'sql/item_strfunc.cc' @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_length_a set_persist_maybe_null(1); }
+ +void Item_func_to_base64::fix_length_and_dec() +{ + maybe_null= args[0]->maybe_null; + collation.set(default_charset(), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);
Maybe better to cast both arguments to (ulonglong) as the rest of the code is depending on this.
Also, why is base64_encode_max_arg_length() int instead of uint or even better size_t?
Alexander> This is because base64_encode() and base64_decode() are used Alexander> in the replication code using "int" as return value.
Alexander> When exposing TO_BASE64() and FROM_BASE64() to the SQL level, Alexander> I did not want to touch the replication code.
ok. We should however at some case fix that.
<cut>
+String *Item_func_from_base64::val_str(String *str) +{ + String *res= args[0]->val_str_ascii(str); + bool too_long= false; + int length; + const char *end_ptr; + + if (!res || + res->length() > (uint) base64_decode_max_arg_length() || + (too_long= + ((uint) (length= base64_needed_decoded_length((int) res->length())) > + current_thd->variables.max_allowed_packet)) || + tmp_value.alloc((uint) length) || + (length= base64_decode(res->ptr(), (int) res->length(), + (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 || + end_ptr < res->ptr() + res->length()) + {
Shouldn't we get a readable error or warning if the string contained wrong characters? Now it looks like we will just return null.
Something like 'Malformed base64 string. Error at position %d" would be nice.
Alexander> Good idea. I have added a warning. Now it looks clear what went wrong Alexander> if FROM_BASE64() returns NULL.
Thanks!
Regards, Monty
Hi!
"Alexander" == Alexander Barkov <bar@mariadb.org> writes:
<cut>
Should we allow not '=' padding in our decoder too? (I think it's best to always pad on encoding).
<cut> Alexander> Just checked PostgreSQL. They also pad on encode, Alexander> and do not accept non-properly padded values on decode: <cut> How about all the programs that don't padd with '='? There was a list of these in the wiki link you gave me ... Regards, Monty
Hi Monty, On 09/23/2013 07:16 PM, Michael Widenius wrote:
Hi!
"Alexander" == Alexander Barkov <bar@mariadb.org> writes:
<cut>
Should we allow not '=' padding in our decoder too? (I think it's best to always pad on encoding).
<cut>
Alexander> Just checked PostgreSQL. They also pad on encode, Alexander> and do not accept non-properly padded values on decode:
<cut>
How about all the programs that don't padd with '='? There was a list of these in the wiki link you gave me ...
It was a list of various variants of base64 that have existed through the time: http://en.wikipedia.org/wiki/Base64 I believe the modern base64 is associated with: - Char for index 62 = "+" - Char for index 63 = "-" - Pad character = "mandatory" Look at the table. All variants that have "+" and "-" for "Char for index 62" and "Char for index 63" have also "Pad character = mandatory". The only exception is: "Modified Base64 encoding for UTF-7 (RFC 1642, obsoleted)". But it's obsoleted long time ago! For those who wants the ancient modified Base64 for UTF-7, there is a simple workaround: mysql> set @a:='aa'; mysql> select from_base64(rpad(@a,cast(length(@a)/4 as signed)*4,'=')); +----------------------------------------------------------+ | from_base64(rpad(@a,cast(length(@a)/4 as signed)*4,'=')) | +----------------------------------------------------------+ | i | +----------------------------------------------------------+ 1 row in set (0.00 sec)
Regards, Monty
Hi!
"Alexander" == Alexander Barkov <bar@mariadb.org> writes:
<cut> Alexander> http://en.wikipedia.org/wiki/Base64 Alexander> I believe the modern base64 is associated with: Alexander> - Char for index 62 = "+" Alexander> - Char for index 63 = "-" Alexander> - Pad character = "mandatory" Alexander> Look at the table. All variants that have "+" and "-" Alexander> for "Char for index 62" and "Char for index 63" Alexander> have also "Pad character = mandatory". Sorry, don't understand the above. (But it was resolved on IRC that you meant + and /) Alexander> The only exception is: Alexander> "Modified Base64 encoding for UTF-7 (RFC 1642, obsoleted)". Alexander> But it's obsoleted long time ago! Alexander> For those who wants the ancient modified Base64 for UTF-7, Alexander> there is a simple workaround: mysql> set @a:='aa'; mysql> select from_base64(rpad(@a,cast(length(@a)/4 as signed)*4,'=')); Alexander> +----------------------------------------------------------+ Alexander> | from_base64(rpad(@a,cast(length(@a)/4 as signed)*4,'=')) | Alexander> +----------------------------------------------------------+ Alexander> | i | Alexander> +----------------------------------------------------------+ Alexander> 1 row in set (0.00 sec) That's ok workaround, but we should at least document this in the manual. Regards, Monty
hi guys! i used many base64 libs when i was developing to android, windows ce and windows phone some base64 don't work without == or = on the end of base64 encoded string in that time i made a workaround with "=" at the end, based on the length() of the string well direct to the point... base64_decode isn't ONE function, it could be implemented with many standards... check wikipedia for example: 1)Original Base64 for Privacy-Enhanced Mail (PEM) (RFC 1421, deprecated) 2)Base64 transfer encoding for MIME (RFC 2045) 3)Standard 'Base64' encoding for RFC 3548 or RFC 4648 4)'Radix-64' encoding for OpenPGP (RFC 4880) 5)Modified Base64 encoding for UTF-7 (RFC 1642, obsoleted) 6)Modified Base64 for filenames (non standard) 7)Base64 with URL and Filename Safe Alphabet (RFC 4648 'base64url' encoding) 8)Non-standard URL-safe Modification of Base64 used in YUI Library (Y64) 9)Modified Base64 for XML name tokens (Nmtoken) 10)Modified Base64 for XML identifiers (Name) 11)Modified Base64 for Program identifiers (variant 1, non standard) 12)Modified Base64 for Program identifiers (variant 2, non standard) 13)Modified Base64 for Regular expressions (non standard) I think the most used standard is rfc2045 (at least php use it and i think perl and others script languages too), for rfc2045 we have: Variant: Base64 transfer encoding for MIME (RFC 2045) Char for index 62: + Char for index 63: / pad char: = (mandatory) Fixed encoded line-length: No (variable) Maximum encoded line length: 76 Line separators: CR+LF Characters outside alphabet: Accepted (discarded) Line checksum: (none) in other words... the TO_BASE64 and FROM_BASE64, isn't a function with only one parameter... i think we could allow two parameters and the second parameters is "standard" default to RFC-2045, and we could accept: RFC1421, RFC2045, (RFC3545/RFC4648), RFC4880, RFC1642, "filenames", RFC4648, Y64, NMTOKEN, "name", "variant1", "variant2", "regexp" what you think? well with this we can report error, warning and others messages to help user selecting the right one i had a problem with wince5 and php, the problem was the char at position 62 one function was using "-" and the other was using "+", well i think this 13 variants of base64 function could explain the problem to everyone, right? thanks guys! 2013/9/23 Michael Widenius <monty@askmonty.org>
Hi!
"Alexander" == Alexander Barkov <bar@mariadb.org> writes:
<cut>
Should we allow not '=' padding in our decoder too? (I think it's best to always pad on encoding).
<cut>
Alexander> Just checked PostgreSQL. They also pad on encode, Alexander> and do not accept non-properly padded values on decode:
<cut>
How about all the programs that don't padd with '='? There was a list of these in the wiki link you gave me ...
Regards, Monty
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Roberto Spadim SPAEmpresarial
Hi!
"Alexander" == Alexander Barkov <bar@mariadb.org> writes:
Alexander> Hi Monty, Alexander> On 09/17/2013 08:12 PM, Michael Widenius wrote: <cut> Alexander> I added a reference to http://en.wikipedia.org/wiki/Base64, Alexander> as well all added these your comments in proper places of the code.
Thanks. That will make the code much easier to understand.
I just read trough the definition and noticed that some versions doesn't use '=' padding.
Should we allow not '=' padding in our decoder too? (I think it's best to always pad on encoding).
Alexander> http://dev.mysql.com/doc/refman/5.6/en/string-functions.html#function_to-bas... Alexander> says: Alexander> > Different base-64 encoding schemes exist. These are the encoding and Alexander> > decoding rules used by TO_BASE64() and FROM_BASE64(): Alexander> > ... Alexander> > * Encoded output consists of groups of 4 printable characters. Each 3 Alexander> > bytes of the input data are encoded using 4 characters. If the last Alexander> > group is incomplete, it is padded with '=' characters to a length of Alexander> > 4. Alexander> So we always pad on encoding. This I tihnk is correct. Alexander> So do most of the modern pieces of software. Alexander> So does PHP: Alexander> http://php.net/manual/en/function.base64-encode.php Alexander> So does Linux "bas64" program: Alexander> $ echo "123" |base64 Alexander> MTIzCg== Alexander> This is why we always require pad characters on decoding. This I don't necessary agree with. One point of this function is to allow reading of coded text from other programs, not only from mysqld. As some other programs are not using padding, why would we not allow them ? Alexander> I don't think we should accept not-padded values. I still don't understand why? What is the disadvantage in that ? The only benefit I see of requiring padding on decoding is that we have 30% change to find out if a message was 'not complete'. The disadvantage is that we would have to somewhere document that we don't support base64 coding from a set of programs. What should we tell the users of these programs when/if they complain that they can't use MariaDB to read their data? <cut> Regards, Monty
Hi Monty, thanks for review. Btw, I did not tell you what the german2 patch actually includes: 1. WL#4013 Unicode german2 collation itself. 2. WL#5624 Collation customization improvements (prerequisite for german2) http://dev.mysql.com/worklog/task/?id=5624 3. Joro's patch for "Bug#62429 XML: ExtractValue, UpdateXML max arg length 127 chars." (prerequisite for WL#5624) 4. Partially, mtr tests for WL#5624. The other part of the WL#5624 tests depends on the WEIGHT_STRING() function which we have not merged yet. Comparing to the first version of the patch, the second version additionally includes: - Monty's review suggestions addressed. - Added *partially* tests for WL#5624 (those not needing WEIGHT_STRING) - Added "WL#5624 Part 13. More verbosity when reading character set definition file" which I somehow missed in the first version. Please also see comments inline: On 09/12/2013 04:32 PM, Michael Widenius wrote:
Review of german2.diff
=== modified file 'include/my_global.h' --- include/my_global.h 2013-07-21 14:39:19 +0000 +++ include/my_global.h 2013-08-29 08:27:09 +0000 @@ -1218,4 +1218,11 @@ static inline double rint(double x) #define HAVE_EXTERNAL_CLIENT #endif /* EMBEDDED_LIBRARY */
+ +enum loglevel { + ERROR_LEVEL= 0, + WARNING_LEVEL= 1, + INFORMATION_LEVEL= 2 +}; + #endif /* my_global_h */
Not sure why the above, which is a my_sys construct, should be in my_global.h. Please add at least a comment about this.
my_sys.h includes m_ctype.h. The definition of loglevel was in my_sys.h, but now it's needed in m_ctype.h as well. So I just moved the definition of loglevel to my_global.h, which is seen in both m_ctype.h and my_sys.h. That solved the problem, but as you say, introduced extra visibility of "loglevel", which is not good. In the second version of the patch (attached) I just moved the definition of loglevel from my_sys.h to m_ctype.h instead. It solves all definition dependencies and does not introduce extra visibility.
=== modified file 'mysys/charset.c' --- mysys/charset.c 2012-08-14 14:23:34 +0000 +++ mysys/charset.c 2013-08-29 12:38:44 +0000 @@ -66,6 +66,9 @@ static my_bool init_state_maps(struct ch if (!(cs->ident_map= ident_map= (uchar*) my_once_alloc(256, MYF(MY_WME)))) return 1;
+ state_map= (uchar *) cs->state_map; + ident_map= (uchar *) cs->ident_map;
Why is the above needed? They are already set one line above. (In addition cs->state_map and cs->ident_map are already uchar *)
Thanks for noticing this. Removed.
--- strings/ctype-simple.c 2013-07-21 14:39:19 +0000 +++ strings/ctype-simple.c 2013-08-29 09:05:07 +0000 @@ -1163,12 +1163,12 @@ static int pcmp(const void * f, const vo return res; }
-static my_bool create_fromuni(struct charset_info_st *cs, - void *(*alloc)(size_t)) +static my_bool +create_fromuni(struct charset_info_st *cs, + MY_CHARSET_LOADER *loader) { uni_idx idx[PLANE_NUM]; int i,n; - struct my_uni_idx_st *tab_from_uni;
/* Check that Unicode map is loaded. @@ -1209,18 +1209,18 @@ static my_bool create_fromuni(struct cha for (i=0; i < PLANE_NUM; i++) { int ch,numchars; - uchar *tab;
/* Skip empty plane */ if (!idx[i].nchars) break;
numchars=idx[i].uidx.to-idx[i].uidx.from+1; - if (!(idx[i].uidx.tab= tab= (uchar*) - alloc(numchars * sizeof(*idx[i].uidx.tab)))) + if (!(idx[i].uidx.tab= (uchar *) + (loader->once_alloc) (numchars * + sizeof(*idx[i].uidx.tab)))) return TRUE;
- bzero(tab,numchars*sizeof(*tab)); + bzero((char*) idx[i].uidx.tab, numchars * sizeof(*idx[i].uidx.tab));
for (ch=1; ch < PLANE_SIZE; ch++) { @@ -1228,32 +1228,32 @@ static my_bool create_fromuni(struct cha if (wc >= idx[i].uidx.from && wc <= idx[i].uidx.to && wc) { int ofs= wc - idx[i].uidx.from; - tab[ofs]= ch; + ((char *) idx[i].uidx.tab)[ofs]= ch; } } }
Why remove tab to be a shortcut for idx[i].uidx.from ? Shouldn't it be faster and shorter to use tab than using idx[i].uidx.from? (At least in this place)
I guess you mean idx[i].uidx.tab. Okey, I restored the variable. But generally I don't like pointer aliases (as it's bug prone). Moreover, having an alias in this code does not improve performance, because this code is called only once per collation during mysqld life time. The best choice would probably be to move the loop inside a function and pass idx[i].uidx.tab into the function.
-static const uint16 page0FCdata[]= { /* FC00 (3 weights per char) */ +static uint16 page0FCdata[]= { /* FC00 (3 weights per char) */ 0x134F,0x135E,0x0000, 0x134F,0x1364,0x0000, 0x134F,0x13B0,0x0000, 0x134F,0x13C7,0x0000, 0x134F,0x13C8,0x0000, 0x1352,0x135E,0x0000, 0x1352,0x1364,0x0000, 0x1352,0x1365,0x0000, 0x1352,0x13B0,0x0000, @@ -6006,7 +6005,7 @@ static const uint16 page0FCdata[]= { /* 0x1381,0x13C8,0x0000, 0x1382,0x13C7,0x0000, 0x1382,0x13C8,0x0000, 0x1364,0x13C7,0x0000 };
If we are not going to change the above, better to keep these as const!
They will then be in the const segment and you will have protection from anyone trying to change these!
Changed back to const. I had to do two casts though, in this variable definition, to resolve const/non-const conflict: MY_UCA_INFO my_uca_v400= { { { 0xFFFF, /* maxchar */ (uchar *) uca_length, (uint16 **) uca_weight, { /* Contractions: */ 0, /* nitems */ NULL, /* item */ NULL /* flags */ } }, }, which should be fine.
<cut>
+my_uca_add_contraction(MY_CONTRACTIONS *list, my_wc_t *wc, size_t len, + my_bool with_context) { - MY_CONTRACTIONS *list= (MY_CONTRACTIONS*) cs->contractions; MY_CONTRACTION *next= &list->item[list->nitems]; - DBUG_ASSERT(len == 2); /* We currently support only contraction2 */ - next->ch[0]= wc[0]; - next->ch[1]= wc[1]; + size_t i; + /* + Contraction is always at least 2 characters. + Contraction is never longer than MY_UCA_MAX_CONTRACTION, + which is guaranteed by using my_coll_rule_expand() with proper limit. + */ + DBUG_ASSERT(len > 1 && len <= MY_UCA_MAX_CONTRACTION); + for (i= 0; i < len; i++) + { + /* + We don't support contractions with U+0000. + my_coll_rule_expand() guarantees there're no U+0000 in a contraction. + */ + DBUG_ASSERT(wc[i] != 0); + next->ch[i]= wc[i]; + } + if (i < MY_UCA_MAX_CONTRACTION) + next->ch[i]= 0; /* Add end-of-line marker */
Wouldn't it make life easier to always have the marker there? (Ie, always have place for the marker). At least you can remove one if from the code. If there is more than one if to remove, then it's a simple optimzation do to...
The Unicode Collation Algorithm tables are quite huge. See ctype-uca.c. Adding a space for 0-terminator for every character weight would make the tables even huger. Not sure if we should do this. Originally, in mysql-4.1 times, I intentionally made this to reduce table sizes.
<cut>
+static int +my_coll_parser_too_long_error(MY_COLL_RULE_PARSER *p, const char *name) +{ + my_snprintf(p->errstr, sizeof(p->errstr), "%s is too long", name); + return 0; +}
You should limit '%s' to 64 characters so that one gets the name in the output even if it's over sizeof()...
%-.64s ....
The list of possible values that are passed to this functions is: "Contraction" "Expansion" "Context" "logical position" No needs for %-.64s.
<cut>
Regards, Monty
Hi, I need help. After merge galera-10.0 with 10.0, rollback asserts. I seem not to be able to find the actual reason. I added some extra output on log.cc to dump out the cache_log data from both trx_cache and stmt_cache, but to me these values do not really mean anything, here some values from the beginning: I do not follow why those pos values are so big ? 130919 9:25:51 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731688 write_pos 67383728 130919 9:25:51 [ERROR] WSREP: ::jan_COMMIT:: STMT: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 67383728 130919 9:25:51 [ERROR] WSREP: ::jan_BINLOG_COMMIT_FLUSH_TRX_CACHE:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731688 write_pos 18263408 130919 9:25:51 [ERROR] WSREP: ::jan_BINLOG_COMMIT_FLUSH_TRX_CACHE:: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 18263408 130919 9:25:51 [ERROR] WSREP: ::jan_BINLOG_FLUSH_CACHE:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731688 write_pos 18571483 130919 9:25:51 [ERROR] WSREP: ::jan_BINLOG_FLUSH_CACHE:: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 18571483 130919 9:25:51 [ERROR] WSREP: ::jan_reset:: Pending (nil) pos 0, pos_in_file 140514620666240 current_pos 140514620731584 request_pos 140514620731584 write_pos 140516368466480 130919 9:25:51 [ERROR] WSREP: ::jan_reset:: Pending (nil) pos 0, pos_in_file 140514620666240 current_pos 140514620731584 request_pos 140514620731584 write_pos 11 130919 9:26:03 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731724 write_pos 67383728 130919 9:26:03 [ERROR] WSREP: ::jan_COMMIT:: STMT: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 67383728 130919 9:26:03 [ERROR] WSREP: ::jan_BINLOG_COMMIT_FLUSH_TRX_CACHE:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731724 write_pos 18263408 130919 9:26:03 [ERROR] WSREP: ::jan_BINLOG_COMMIT_FLUSH_TRX_CACHE:: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 18263408 130919 9:26:03 [ERROR] WSREP: ::jan_BINLOG_FLUSH_CACHE:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731724 write_pos 18571483 130919 9:26:03 [ERROR] WSREP: ::jan_BINLOG_FLUSH_CACHE:: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 18571483 130919 9:26:03 [ERROR] WSREP: ::jan_reset:: Pending (nil) pos 0, pos_in_file 140514620666240 current_pos 140514620731584 request_pos 140514620731584 write_pos 140516368466480 130919 9:26:03 [ERROR] WSREP: ::jan_reset:: Pending (nil) pos 0, pos_in_file 140514620666240 current_pos 140514620731584 request_pos 140514620731584 write_pos 11 130919 9:26:10 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731706 write_pos 67383728 130919 9:26:10 [ERROR] WSREP: ::jan_COMMIT:: STMT: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 67383728 130919 9:26:10 [ERROR] WSREP: ::jan_BINLOG_COMMIT_FLUSH_TRX_CACHE:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731706 write_pos 18263408 130919 9:26:10 [ERROR] WSREP: ::jan_BINLOG_COMMIT_FLUSH_TRX_CACHE:: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 18263408 130919 9:26:10 [ERROR] WSREP: ::jan_BINLOG_FLUSH_CACHE:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731706 write_pos 18571483 130919 9:26:10 [ERROR] WSREP: ::jan_BINLOG_FLUSH_CACHE:: STMT: Pending (nil) pos 0, pos_in_file 140514620665776 current_pos 140514620698688 request_pos 140514620698688 write_pos 18571483 Still the test assert on the exactly the same place i.e.: 130919 9:28:40 [ERROR] WSREP: ::jan_COMMIT:: STMT: STMT: Pending (nil) pos 0, pos_in_file 140514418699920 current_pos 140514418701792 request_pos 140514418701792 write_pos 140514418513168 130919 9:28:41 [ERROR] WSREP: ::jan_BINLOG_ROLLBACK::TRX: Pending (nil) pos 11936128518282651045, pos_in_file 140514418700384 current_pos 140514418735008 request_pos 140514418738328 write_pos 16516442 130919 9:28:41 [ERROR] WSREP: ::jan_BINLOG_ROLLBACK:: STMT: Pending (nil) pos 0, pos_in_file 140514418699920 current_pos 140514418701792 request_pos 140514418701792 write_pos 16516442 130919 9:28:41 [ERROR] WSREP: ::jan_BINLOG_TRUNCATE_TRX_CACHE:: Pending (nil) pos 11936128518282651045, pos_in_file 140514418700384 current_pos 140514418735008 request_pos 140514418738328 write_pos 140514418494000 130919 9:28:41 [ERROR] WSREP: ::jan_BINLOG_TRUNCATE_TRX_CACHE:: STMT: Pending (nil) pos 0, pos_in_file 140514418699920 current_pos 140514418701792 request_pos 140514418701792 write_pos 140514418494000 130919 9:28:41 [ERROR] WSREP: ::jan_reset:: Pending (nil) pos 11936128518282651045, pos_in_file 140514418700384 current_pos 140514418735008 request_pos 140514418738328 write_pos 18571483 mysqld: /home/jan/mysql/galera-10.0/sql/log.cc:295: void binlog_cache_data::reset(): Assertion `empty()' failed. 130919 9:28:41 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see http://kb.askmonty.org/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.0.4-MariaDB-debug-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=7 max_threads=153 thread_count=7 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467201 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0x4d68d40 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7fcc84314e00 thread_stack 0x48000 mysys/stacktrace.c:246(my_print_stacktrace)[0xe3cd92] sql/signal_handler.cc:155(handle_fatal_signal)[0x867dab] ??:0(??)[0x7fcc96384bd0] ??:0(??)[0x7fcc957d8037] ??:0(??)[0x7fcc957db698] ??:0(??)[0x7fcc957d0e03] ??:0(??)[0x7fcc957d0eb2] sql/log.cc:296(binlog_cache_data::reset())[0x94a533] sql/log.cc:464(binlog_cache_mngr::reset(bool, bool))[0x94a8ec] sql/log.cc:2008(binlog_truncate_trx_cache)[0x936265] sql/log.cc:2205(binlog_rollback)[0x936d7b] sql/handler.cc:1600(ha_rollback_trans(THD*, bool))[0x86b625] sql/transaction.cc:321(trans_rollback(THD*))[0x7a0cbb] sql/sql_parse.cc:4638(mysql_execute_command(THD*))[0x671889] sql/sql_parse.cc:6820(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x677d28] sql/sql_parse.cc:6653(wsrep_mysql_parse)[0x6774a9] sql/sql_parse.cc:1459(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x6688a4] sql/sql_parse.cc:1080(do_command(THD*))[0x66787a] sql/sql_connect.cc:1404(do_handle_one_connection(THD*))[0x78baa9] sql/sql_connect.cc:1312(handle_one_connection)[0x78b7e9] perfschema/pfs.cc:1855(pfs_spawn_thread)[0xb0d40e] ??:0(??)[0x7fcc9637cf8e] ??:0(??)[0x7fcc9589ae1d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7fcc10005288): ROLLBACK Connection ID (thread ID): 11 Status: NOT_KILLED -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Jan Lindström <jplindst@mariadb.org> writes:
I need help. After merge galera-10.0 with 10.0, rollback asserts. I seem not to be able to find the actual reason. I added some extra output on log.cc to dump out the cache_log data from both trx_cache and stmt_cache, but to me these values do not really mean anything, here some values from the beginning:
I do not follow why those pos values are so big ?
Indeed. This strongly suggests that some memory corruption, or access of uninitialised memory, is happening. Did you manage to repeat it with Valgrind? If the problem is memory corruption, it is really hard to tell the origin of the problem from the information given here - the real problem could be in a completely different part of the code... - Kristian.
130919 9:25:51 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 140514620666192 current_pos 140514620731584 request_pos 140514620731688 write_pos 67383728
Hi,
Indeed. This strongly suggests that some memory corruption, or access of uninitialised memory, is happening.
Did you manage to repeat it with Valgrind? If the problem is memory corruption, it is really hard to tell the origin of the problem from the information given here - the real problem could be in a completely different part of the code...
Yes, I did run using valgrind, while it reported some uninitialized variables, not really any memory corruptions. Attached, the log from node0. Similarly, debug=+d did not help. Where I should add some output to narrow down the place where the corruption happens. R: Jan -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Jan Lindström <jplindst@mariadb.org> writes:
Yes, I did run using valgrind, while it reported some uninitialized variables, not really any memory corruptions. Attached, the log from node0. Similarly,
Well, uninitialised variables are exactly what I would expect could lead to things looking like you have shown so far. So if you have Valgrind errors, you should definitely start by fixing them. However, I see no messages from Valgrind in the log you attached?!? So please, repeat the exact problem you get (assertion during rollback) while running with Valgrind. Provide the exact command you used to get the error in this particular run, and the correct log that shows the Valgrind errors about uninitialised variables, and I can give more suggestions for how to fix.
debug=+d did not help. Where I should add some output to narrow down the place where the corruption happens.
What do you mean "did not help"? Did it not work to produce any output? Maybe a different syntax is needed (I usually enable it using ./mtr --debug, but I did test using the above syntax directly). Once you get dbug output enabled, you get a file containing the output from every DBUG_ENTER() and DBUG_PRINT() statement in the code (likely many megabytes worth). This will help show what the code was doing before it crashed, so we can backtrack in the source and try to find the root of the issue. So again, repeat the exact problem you get with dbug enabled. Provide the exact command you use to repeat the problem in this exact run. And provide the resolved stack trace in GDB of the exact same failure run. And upload the DBUG output somewhere (it will be too big to mail). Then I can take a look and see if I can come up with more suggestions. - Kristian.
Hi, I repeated the exact problem. Firstly, you need to have this branch: jan@jan-GE70-0NC-0ND ~/mysql/mariadb-patches $ bzr info Standalone tree (format: 2a) Location: branch root: . Related branches: parent branch: bzr+ssh://bazaar.launchpad.net/~elenst/randgen/mariadb-patches/ Inside that directory I run following command : perl ./runall-new.pl --grammar=conf/engines/engine_stress.yy --gendata=conf/engines/engine_stress.zz --duration=600 --queries=100M --threads=4 --galera=mmm --basedir=/home/jan/mysql/galera-10.0 --vardir=/home/jan/mysql/galera-test --mysqld=--wsrep-provider=/usr/lib/libgalera_smm.so --debug --mysqld=--general-log=1 --mysqld=--general_log_file=/home/jan/mysql/galera-test/stmt.log --logfile=/home/jan/mysql/galera-test/rpg.log --valgrind Correct log file attached.
Yes, I did run using valgrind, while it reported some uninitialized variables, not really any memory corruptions. Attached, the log from node0. Similarly, Well, uninitialised variables are exactly what I would expect could lead to things looking like you have shown so far. So if you have Valgrind errors, you should definitely start by fixing them.
However, I see no messages from Valgrind in the log you attached?!?
So please, repeat the exact problem you get (assertion during rollback) while running with Valgrind. Provide the exact command you used to get the error in this particular run, and the correct log that shows the Valgrind errors about uninitialised variables, and I can give more suggestions for how to fix.
debug=+d did not help. Where I should add some output to narrow down the place where the corruption happens.
It did produce the output, not yet found the reason, some of the output just before corruption MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event: 0x7f2a200bc800 MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr->pending(): 0x0 Rows_log_event::do_add_row_data: enter: row_data: 0x7f2a200cb248 length: 10 Rows_log_event::do_add_row_data: row_data: Memory: 0x7f2a200cb248 Bytes: (10) F8 01 00 00 00 01 74 02 77 65 my_realloc: my: ptr: 0x0 size: 1024 my_flags: 80 my_malloc: my: size: 1024 my_flags: 80 my_malloc: exit: ptr: 0x7f2a200895e0 THD::binlog_query: enter: qtype: ROW query: 'INSERT /*! IGNORE */ INTO table1_int_autoinc VALUES (NULL, 't', 'we', NULL, NULL)' MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event: 0x0 MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr->pending(): 0x7f2a200bc800 Log_event::write_header: enter: filepos: 65 length: 20 type: 23 MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): m_width: Memory: 0x7f2a882e0190 Bytes: (1) 05 MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): m_cols: Memory: 0x7f2a200bc8c8 Bytes: (1) FF MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): rows: Memory: 0x7f2a200895e0 Bytes: (10) F8 01 00 00 00 01 74 02 77 65 my_free: my: ptr: 0x7f2a200895e0 sf_malloc_usable_size: exit: size: 1024 flags: 80 my_free: my: ptr: 0x7f2a200bc800 sf_malloc_usable_size: exit: size: 304 flags: 24 THD::binlog_query: debug: is_current_stmt_binlog_format_row: 1 mysql_insert: THD::enter_stage: /home/jan/mysql/galera-10.0/sql/sql_insert.cc:1078 alloc_root: enter: root: 0x39a4640 alloc_root: exit: ptr: 0x7f2a20005bf8 mysql_execute_command: THD::enter_stage: /home/jan/mysql/galera-10.0/sql/sql_parse.cc:5323 trans_register_ha: enter: stmt ha_commit_trans: info: all: 0 thd->in_sub_stmt: 0 ha_info: 0x39a14c8 is_real_trans: 1 wsrep_run_wsrep_commit: wsrep: replicating commit wsrep_run_wsrep_commit: mutex: LOCK_wsrep_thd (0x39a2fe0) locking wsrep_run_wsrep_commit: mutex: LOCK_wsrep_thd (0x39a2fe0) locked wsrep_run_wsrep_commit: mutex: LOCK_wsrep_replaying (0x18ed380) locking my_malloc: my: size: 192 my_flags: 24 my_malloc: exit: ptr: 0x7f2a200bc800 my_hash_init: enter: hash: 0x7f2a200bc800 size: 128 my_malloc: my: size: 2048 my_flags: 0 my_malloc: exit: ptr: 0x7f2a20082e60 my_hash_init: enter: hash: 0x7f2a200bc860 size: 128 my_malloc: my: size: 2048 my_flags: 0 my_malloc: exit: ptr: 0x7f2a200b7f70 my_malloc: my: size: 56 my_flags: 56 my_malloc: exit: ptr: 0x7f2a200cedd0 add_to_locked_mutex: info: inserting 0x7f2a200cedd0 into 0x39a2fe0 (id: 284 -> 254) wsrep_run_wsrep_commit: mutex: LOCK_wsrep_replaying (0x18ed380) locked wsrep_run_wsrep_commit: mutex: LOCK_wsrep_replaying (0x18ed380) unlocking wsrep_run_wsrep_commit: mutex: LOCK_wsrep_thd (0x39a2fe0) unlocking reinit_io_cache: enter: cache: 0x7f2a200cf5e0 type: 1 seek_offset: 0 clear_cache: 0 my_malloc: my: size: 104 my_flags: 0 my_malloc: exit: ptr: 0x7f2a200bc940 my_free: my: ptr: 0x7f2a200bc940 sf_malloc_usable_size: exit: size: 104 flags: 0 wsrep_run_wsrep_commit: mutex: LOCK_wsrep_thd (0x39a2fe0) locking wsrep_run_wsrep_commit: mutex: LOCK_wsrep_thd (0x39a2fe0) locked wsrep_run_wsrep_commit: wsrep: replicating commit success wsrep_run_wsrep_commit: mutex: LOCK_wsrep_thd (0x39a2fe0) unlocking print_buffer_to_file: enter: buffer: WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 139818903074288 current_pos 139818903139504 request_pos 139818903139608 write_pos 49062320 R: -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Jan Lindström <jplindst@mariadb.org> writes:
Correct log file attached.
==00:00:02:33.708 25289== Use of uninitialised value of size 8 ==00:00:02:33.708 25289== at 0x5EF865B: _itoa_word (_itoa.c:179) ==00:00:02:33.708 25289== by 0x5EFCB91: vfprintf (vfprintf.c:1654) ==00:00:02:33.708 25289== by 0x5F22654: vsnprintf (vsnprintf.c:119) ==00:00:02:33.708 25289== by 0x5F03141: snprintf (snprintf.c:34) ==00:00:02:33.708 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) ==00:00:02:33.708 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) ==00:00:02:33.708 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454) ==00:00:02:33.709 25289== ==00:00:02:33.709 25289== Conditional jump or move depends on uninitialised value(s) ==00:00:02:33.709 25289== at 0x5EF8665: _itoa_word (_itoa.c:179) ==00:00:02:33.709 25289== by 0x5EFCB91: vfprintf (vfprintf.c:1654) ==00:00:02:33.709 25289== by 0x5F22654: vsnprintf (vsnprintf.c:119) ==00:00:02:33.709 25289== by 0x5F03141: snprintf (snprintf.c:34) ==00:00:02:33.709 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) ==00:00:02:33.709 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) ==00:00:02:33.709 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454) ==00:00:02:33.709 25289== 130920 13:53:04 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 892097568 current_pos 892132560 request_pos 892132664 write_pos 303074896
So this is the explanation for the strange numbers in your printouts, like 892097568. Those values are uninitialised memory (probably from uninitialised stack variables?). Since the assertion you get is exactly about the values of those IO_CACHE values being wrong, this looks like something to investigate. I am however curious about why there is only a single Valgrind error about this? As there are many of those unexpected numbers in your output. However, most of the Valgrind errors seem to be in Galera code. It is hard to say for me if they are related. Seppo, can you look into that from your side? Normally, we strive _very_ hard to make MariaDB code clean in Valgrind...
It did produce the output, not yet found the reason, some of the output just before corruption
MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event: 0x7f2a200bc800 MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr-> pending(): 0x0 Rows_log_event::do_add_row_data: enter: row_data: 0x7f2a200cb248 length: 10 Rows_log_event::do_add_row_data: row_data: Memory: 0x7f2a200cb248 Bytes: (10) F8 01 00 00 00 01 74 02 77 65 my_realloc: my: ptr: 0x0 size: 1024 my_flags: 80 my_malloc: my: size: 1024 my_flags: 80 my_malloc: exit: ptr: 0x7f2a200895e0
I need to see the *complete* log to be able to help. Yes, I know it can be hundreds of megabyte maybe, but it should compress well. Upload it somewhere I can access it and I can take a look. Using this log file, we can deduce exactly which paths the code was taking as it got to that assert. I'm wondering if the ROLLBACK happens early and ends up executing some binlog code at a point where transaction start has not properly initialised the necessary data structures. So we need the dbug output from the entire execution since transaction start. (If you can get the error log/Valgrind output in the same log file as the dbug, that would make things much easier. But if not, I can still try to decipher it...) - Kristian.
On 2013-09-20 15:51, Kristian Nielsen wrote:
Jan Lindström <jplindst@mariadb.org> writes:
Correct log file attached.
==00:00:02:33.708 25289== Use of uninitialised value of size 8 ==00:00:02:33.708 25289== at 0x5EF865B: _itoa_word (_itoa.c:179) ==00:00:02:33.708 25289== by 0x5EFCB91: vfprintf (vfprintf.c:1654) ==00:00:02:33.708 25289== by 0x5F22654: vsnprintf (vsnprintf.c:119) ==00:00:02:33.708 25289== by 0x5F03141: snprintf (snprintf.c:34) ==00:00:02:33.708 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) ==00:00:02:33.708 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) ==00:00:02:33.708 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454) ==00:00:02:33.709 25289== ==00:00:02:33.709 25289== Conditional jump or move depends on uninitialised value(s) ==00:00:02:33.709 25289== at 0x5EF8665: _itoa_word (_itoa.c:179) ==00:00:02:33.709 25289== by 0x5EFCB91: vfprintf (vfprintf.c:1654) ==00:00:02:33.709 25289== by 0x5F22654: vsnprintf (vsnprintf.c:119) ==00:00:02:33.709 25289== by 0x5F03141: snprintf (snprintf.c:34) ==00:00:02:33.709 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) ==00:00:02:33.709 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) ==00:00:02:33.709 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454) ==00:00:02:33.709 25289== 130920 13:53:04 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 892097568 current_pos 892132560 request_pos 892132664 write_pos 303074896
So this is the explanation for the strange numbers in your printouts, like 892097568. Those values are uninitialised memory (probably from uninitialised stack variables?).
Since the assertion you get is exactly about the values of those IO_CACHE values being wrong, this looks like something to investigate.
I am however curious about why there is only a single Valgrind error about this? As there are many of those unexpected numbers in your output.
However, most of the Valgrind errors seem to be in Galera code. It is hard to say for me if they are related. Seppo, can you look into that from your side? Normally, we strive _very_ hard to make MariaDB code clean in Valgrind...
Kristian, so do we. Please make sure you're using debug Galera build and have GLIBCXX_FORCE_NEW environment variable exported (http://valgrind.org/docs/manual/faq.html#faq.reports) Regards, Alex
It did produce the output, not yet found the reason, some of the output just before corruption
MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event: 0x7f2a200bc800 MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr-> pending(): 0x0 Rows_log_event::do_add_row_data: enter: row_data: 0x7f2a200cb248 length: 10 Rows_log_event::do_add_row_data: row_data: Memory: 0x7f2a200cb248 Bytes: (10) F8 01 00 00 00 01 74 02 77 65 my_realloc: my: ptr: 0x0 size: 1024 my_flags: 80 my_malloc: my: size: 1024 my_flags: 80 my_malloc: exit: ptr: 0x7f2a200895e0
I need to see the *complete* log to be able to help. Yes, I know it can be hundreds of megabyte maybe, but it should compress well. Upload it somewhere I can access it and I can take a look.
Using this log file, we can deduce exactly which paths the code was taking as it got to that assert. I'm wondering if the ROLLBACK happens early and ends up executing some binlog code at a point where transaction start has not properly initialised the necessary data structures. So we need the dbug output from the entire execution since transaction start.
(If you can get the error log/Valgrind output in the same log file as the dbug, that would make things much easier. But if not, I can still try to decipher it...)
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- Alexey Yurchenko, Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011
Hi, Full unedited log using --valgrind and --mysqld=--debug=+d at location: https://www.dropbox.com/s/dcufp7l1cch8dfs/mysql.err.gz In below lines I have: static int binlog_commit(handlerton *hton, THD *thd, bool all) { int error= 0; DBUG_ENTER("binlog_commit"); binlog_cache_mngr *const cache_mngr= (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); #ifdef WITH_WSREP if (!cache_mngr) DBUG_RETURN(0); #endif /* WITH_WSREP */ WSREP_ERROR("::jan_BINLOG_COMMIT:: TRX: Pending %p pos %llu, pos_in_file %llu current_pos %llu request_pos %llu write_pos %llu", cache_mngr->trx_cache.pending(), cache_mngr->trx_cache.cache_log.pos_in_file, cache_mngr->trx_cache.cache_log.current_pos, cache_mngr->trx_cache.cache_log.request_pos, cache_mngr->trx_cache.cache_log.write_pos); WSREP_ERROR("::jan_COMMIT:: STMT: STMT: Pending %p pos %llu, pos_in_file %llu current_pos %llu request_pos %llu write_pos %llu", cache_mngr->stmt_cache.pending(), cache_mngr->stmt_cache.cache_log.pos_in_file, cache_mngr->stmt_cache.cache_log.current_pos, cache_mngr->stmt_cache.cache_log.request_pos, cache_mngr->stmt_cache.cache_log.write_pos); And that macro is // MySQL logging functions don't seem to understand long long length modifer. // This is a workaround. It also prefixes all messages with "WSREP" #define WSREP_LOG(fun, ...) \ { \ char msg[1024] = {'\0'}; \ snprintf(msg, sizeof(msg) - 1, ## __VA_ARGS__); \ fun("WSREP: %s", msg); \ } #define WSREP_DEBUG(...) \ if (wsrep_debug) WSREP_LOG(sql_print_information, ##__VA_ARGS__) #define WSREP_INFO(...) WSREP_LOG(sql_print_information, ##__VA_ARGS__) #define WSREP_WARN(...) WSREP_LOG(sql_print_warning, ##__VA_ARGS__) #define WSREP_ERROR(...) WSREP_LOG(sql_print_error, ##__VA_ARGS__) Thus , it seems that trx_cache and stmt_cache are not initialized, where that should happen ? > Jan Lindström <jplindst@mariadb.org> writes: > >> Correct log file attached. >> ==00:00:02:33.708 25289== Use of uninitialised value of size 8 >> ==00:00:02:33.708 25289== at 0x5EF865B: _itoa_word (_itoa.c:179) >> ==00:00:02:33.708 25289== by 0x5EFCB91: vfprintf (vfprintf.c:1654) >> ==00:00:02:33.708 25289== by 0x5F22654: vsnprintf (vsnprintf.c:119) >> ==00:00:02:33.708 25289== by 0x5F03141: snprintf (snprintf.c:34) >> ==00:00:02:33.708 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) >> ==00:00:02:33.708 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) >> ==00:00:02:33.708 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) >> ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) >> ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) >> ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) >> ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) >> ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454) >> ==00:00:02:33.709 25289== >> ==00:00:02:33.709 25289== Conditional jump or move depends on uninitialised value(s) >> ==00:00:02:33.709 25289== at 0x5EF8665: _itoa_word (_itoa.c:179) >> ==00:00:02:33.709 25289== by 0x5EFCB91: vfprintf (vfprintf.c:1654) >> ==00:00:02:33.709 25289== by 0x5F22654: vsnprintf (vsnprintf.c:119) >> ==00:00:02:33.709 25289== by 0x5F03141: snprintf (snprintf.c:34) >> ==00:00:02:33.709 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) >> ==00:00:02:33.709 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) >> ==00:00:02:33.709 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) >> ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) >> ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) >> ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) >> ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) >> ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454) >> ==00:00:02:33.709 25289== >> 130920 13:53:04 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 892097568 current_pos 892132560 request_pos 892132664 write_pos 303074896 > So this is the explanation for the strange numbers in your printouts, like > 892097568. Those values are uninitialised memory (probably from uninitialised > stack variables?). > > Since the assertion you get is exactly about the values of those IO_CACHE > values being wrong, this looks like something to investigate. > > I am however curious about why there is only a single Valgrind error about > this? As there are many of those unexpected numbers in your output. > > However, most of the Valgrind errors seem to be in Galera code. It is hard to > say for me if they are related. Seppo, can you look into that from your side? > Normally, we strive _very_ hard to make MariaDB code clean in Valgrind... > >> It did produce the output, not yet found the reason, some of the output just >> before corruption >> >> MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event: >> 0x7f2a200bc800 >> MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr-> >> pending(): 0x0 >> Rows_log_event::do_add_row_data: enter: row_data: 0x7f2a200cb248 length: 10 >> Rows_log_event::do_add_row_data: row_data: Memory: 0x7f2a200cb248 Bytes: (10) >> F8 01 00 00 00 01 74 02 77 65 >> my_realloc: my: ptr: 0x0 size: 1024 my_flags: 80 >> my_malloc: my: size: 1024 my_flags: 80 >> my_malloc: exit: ptr: 0x7f2a200895e0 > I need to see the *complete* log to be able to help. Yes, I know it can be > hundreds of megabyte maybe, but it should compress well. Upload it somewhere > I can access it and I can take a look. > > Using this log file, we can deduce exactly which paths the code was taking as > it got to that assert. I'm wondering if the ROLLBACK happens early and ends up > executing some binlog code at a point where transaction start has not properly > initialised the necessary data structures. So we need the dbug output from the > entire execution since transaction start. > > (If you can get the error log/Valgrind output in the same log file as the > dbug, that would make things much easier. But if not, I can still try to > decipher it...) > > - Kristian. R: -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Jan Lindström <jplindst@mariadb.org> writes:
Full unedited log using --valgrind and --mysqld=--debug=+d at location: https:/ /www.dropbox.com/s/dcufp7l1cch8dfs/mysql.err.gz
Ok, thanks. Hm, this log looks different from what I am used to seeing when I use mysql-test-run.pl --debug. From looking at the mysql-test-run.pl code it seems I should have asked for --debug=d,*:t:i which will show function nesting and thread IDs, missing from the current log. If you can make a new log using the above command then it would be useful. But meanwhile, I will see if I can figure something out from the log you sent. I am looking at the tree lp:~maria-captains/maria/maria-10.0-galera/ , is that still the current tree? - Kristian.
Ok, so I analysed things and found something. The direct cause of the assert is that reinit_io_cache() fails, as seen in the logs: reinit_io_cache: enter: cache: 0x3cee63b0 type: 2 seek_offset: 0 clear_cache: 0 my_b_flush_io_cache: enter: cache: 0x3cee63b0 my_seek: my: fd: 58 Pos: 11936128518282651045 Whence: 0 MyFlags: 16 my_error: my: nr: 33 MyFlags: 0 errno: 22 my_message_sql: error: error: 33 message: 'Can't seek in file '/home/jan/mysql/galera-test/node0/tmp/ML7E3rnl' (Errcode: 22 "Invalid argument")' Flag: 0 my_seek: error: lseek: 18446744073709551615 errno: 22 Error handling is missing when the binlog code calls reinit_io_cache(), so the error is ignored, and it proceeds to trigger the assertion in reset() as the reinit_io_cache() did not complete. The invalid seek offset is 0xa5a5a5a5a5a5a5a5, uninitialised memory. The root cause of this seems to occur much earlier. The bad offset appears a number of times previous in the log, until a ROLLBACK statement happens to trigger the assertion. It appears first at the execution of a ROLLBACK TO SAVEPOINT A: dispatch_command: query: ROLLBACK TO SAVEPOINT A reset_current_stmt_binlog_format_row: debug: temporary_tables: no, in_sub_stmt: no, system_thread: NON_SYSTEM_THREAD mysql_reset_thd_for_next_command: debug: is_current_stmt_binlog_format_row(): 1 alloc_root: enter: root: 0x3d064100 alloc_root: exit: ptr: 0x3d2870d8 binlog_trans_log_truncate: enter: pos: 11936128518282651045 binlog_trans_log_truncate: info: truncating to position 11936128518282651045 reinit_io_cache: enter: cache: 0x3cee63b0 type: 2 seek_offset: 11936128518282651045 clear_cache: 0 The wrong offset comes into the binlog code from the upper layer in this handlerton method: static int binlog_savepoint_rollback(handlerton *hton, THD *thd, void *sv) ... binlog_trans_log_truncate(thd, *(my_off_t*)sv); The parameter void *sv is supposed to have been initialised in a prior call to binlog_savepoint_set(), when executing the SAVEPOINT A method. However, as far as I could determine, there is no such matching call, which would explain why *sv is uninitialised. This would be easier to see in the log with --debug=d,*:t:i where all the lines are prefixed with thread ID, but it seems clear enough, there is no prior logged binlog_savepoint_set() using the same io_cache 0x3cee63b0. So it appears the problem is that the test does a ROLLBACK TO SAVEPOINT A without a prior SAVEPOINT A statement. This should have given an error (right?), but instead it seems to end up calling into the binlog with uninitialised data, which corrupts the internal binlog state and eventually leads to the assertion. Does this help you proceed? Maybe you can verify this scenario from the general log. You may also be able to produce a stand-alone test case (by taking stuff from the general log, or writing your own). One transaction does a ROLLBACK TO SAVEPOINT without a matching prior SAVEPOINT. And then later a ROLLBACK (I think in a different transaction, but you can try same transaction also) triggers the actual assertion. Or if not, send the full log with --debug=d,*:t:i, and I can look again. Or maybe you can do so yourself, find the occurence of the invalid seek position and single out the stuff for that thread id only. Hope this helps, - Kristian. Kristian Nielsen <knielsen@knielsen-hq.org> writes:
Jan Lindström <jplindst@mariadb.org> writes:
Full unedited log using --valgrind and --mysqld=--debug=+d at location: https:/ /www.dropbox.com/s/dcufp7l1cch8dfs/mysql.err.gz
Ok, thanks.
Hm, this log looks different from what I am used to seeing when I use mysql-test-run.pl --debug. From looking at the mysql-test-run.pl code it seems I should have asked for
--debug=d,*:t:i
which will show function nesting and thread IDs, missing from the current log.
Hi, Thanks for the help, I do not seem to be able to run with that --debug=d,*:t:i setup, it does not seem to progress. But, I attached a debugger to node0 mysqld and set up a breakpoint on savepoint set and rollback, I see at set: #1 0x000000000086c981 in ha_savepoint (thd=0x41b9d40, sv=0x7f0ac4007328) at /home/jan/mysql/galera-10.0/sql/handler.cc:2065 (gdb) graph display *trans (gdb) p *sv $4 = {prev = 0xa5a5a5a5a5a5a5a5, name = 0x7f0ac4007390 "A", length = 1, ha_list = 0xa5a5a5a5a5a5a5a5, mdl_savepoint = {m_stmt_ticket = 0xa5a5a5a5a5a5a5a5, m_trans_ticket = 0xa5a5a5a5a5a5a5a5}} and in static int binlog_savepoint_set(handlerton *hton, THD *thd, void *sv) { DBUG_ENTER("binlog_savepoint_set"); int error= 1; char buf[1024]; #ifdef WITH_WSREP if (wsrep_emulate_bin_log) DBUG_RETURN(0); #endif /* WITH_WSREP */ wsrep_emulate_bin_log = 1, thus no savepoint is written to bin log and continue in debugger we get #1 0x000000000086c6fe in ha_rollback_to_savepoint (thd=0x47c7d40, sv=0x7fe364007328) at /home/jan/mysql/galera-10.0/sql/handler.cc:2009 (gdb) print *sv $2 = {prev = 0x0, name = 0x7fe364007390 "A", length = 1, ha_list = 0x47c9ba0, mdl_savepoint = {m_stmt_ticket = 0x0, m_trans_ticket = 0x7fe36405b1e0}} where mdl_savepoint.m_trans_ticket (gdb) p *sv->mdl_savepoint.m_trans_ticket $5 = {<MDL_wait_for_subgraph> = {_vptr.MDL_wait_for_subgraph = 0x16ce390 <vtable for MDL_ticket+16>}, next_in_context = 0x7fe36405aec0, prev_in_context = 0x47c7f30, next_in_lock = 0x0, prev_in_lock = 0x7fe364059298, m_type = MDL_SHARED_WRITE, m_duration = MDL_TRANSACTION, m_ctx = 0x47c7e38, m_lock = 0x4659000} (gdb) p sv $6 = (SAVEPOINT *) 0x7fe364007328 (gdb) p ht $7 = (handlerton *) 0x3aa1fb0 (gdb) p ht->savepoint_offset This I can't understand: if ((err= ht->savepoint_rollback(ht, thd, (uchar *)(sv+1)+ht->savepoint_offset))) there (gdb) p ht->savepoint_offset $8 = 0 and $9 = (SAVEPOINT *) 0x7fe364007358 (gdb) p *(sv+1) $10 = {prev = 0xa5a5a5a5a5a5a5a5, name = 0xa5a5a5a5a5a5a5a5 <Address 0xa5a5a5a5a5a5a5a5 out of bounds>, length = 2779096485, ha_list = 0xa5a5a5a5a5a5a5a5, mdl_savepoint = {m_stmt_ticket = 0xa5a5a5a5a5a5a5a5, m_trans_ticket = 0xa5a5a5a5a5a5a5a5}}
Ok, so I analysed things and found something.
The direct cause of the assert is that reinit_io_cache() fails, as seen in the logs:
reinit_io_cache: enter: cache: 0x3cee63b0 type: 2 seek_offset: 0 clear_cache: 0 my_b_flush_io_cache: enter: cache: 0x3cee63b0 my_seek: my: fd: 58 Pos: 11936128518282651045 Whence: 0 MyFlags: 16 my_error: my: nr: 33 MyFlags: 0 errno: 22 my_message_sql: error: error: 33 message: 'Can't seek in file '/home/jan/mysql/galera-test/node0/tmp/ML7E3rnl' (Errcode: 22 "Invalid argument")' Flag: 0 my_seek: error: lseek: 18446744073709551615 errno: 22
Error handling is missing when the binlog code calls reinit_io_cache(), so the error is ignored, and it proceeds to trigger the assertion in reset() as the reinit_io_cache() did not complete.
Should I do a bug report to you or can this handled in some other MDEV ? R: -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Jan Lindström <jplindst@mariadb.org> writes:
static int binlog_savepoint_set(handlerton *hton, THD *thd, void *sv) {
#ifdef WITH_WSREP if (wsrep_emulate_bin_log) DBUG_RETURN(0); #endif /* WITH_WSREP */
wsrep_emulate_bin_log = 1, thus no savepoint is written to bin log
Ah, I missed that. Right, then I think the problem is clear. In binlog_savepoint_set(), we do at the end to initialise *sv: binlog_trans_log_savepos(thd, (my_off_t*) sv); Then in binlog_savepoint_rollback() we read back the value that binlog_savepoint_set() put in: binlog_trans_log_truncate(thd, *(my_off_t*)sv); But the handling of WITH_WSREP and wsrep_emulate_bin_log=1 is done incorrectly. Because in this case the initialisation in binlog_savepoint_set() is disabled, but the value is still used in binlog_savepoint_rollback() (the access is outside of if (!wsrep_emulate_bin_log && ...). So this is clearly a problem with the WSREP patch related to savepoints. I am not familiar with how that is supposed to work, so now you need to ask the Codership people how this is supposed to work. Either binlog_savepoint_rollback() should not access *sv in the wsrep_emulate_bin_log case, or alternatively binlog_savepoint_set() _should_ initialise *sv in this case.
Thanks for the help, I do not seem to be able to run with that --debug=d,*:t:i setup, it does not seem to progress.
Maybe it is just slow (especially if you also had Valgrind enabled), it will generate lots and lots of output. - Kristian.
Hi, The point I do not understant is at ha_savepoint we have again: if ((err= ht->savepoint_set(ht, thd, (uchar *)(sv+1)+ht->savepoint_offset))) (gdb) p *sv $3 = {prev = 0xa5a5a5a5a5a5a5a5, name = 0x7f505c007390 "A", length = 1, ha_list = 0xa5a5a5a5a5a5a5a5, mdl_savepoint = {m_stmt_ticket = 0xa5a5a5a5a5a5a5a5, m_trans_ticket = 0xa5a5a5a5a5a5a5a5}} (gdb) p *(sv+1) $2 = {prev = 0xa5a5a5a5a5a5a5a5, name = 0xa5a5a5a5a5a5a5a5 <Address 0xa5a5a5a5a5a5a5a5 out of bounds>, length = 2779096485, ha_list = 0xa5a5a5a5a5a5a5a5, mdl_savepoint = {m_stmt_ticket = 0xa5a5a5a5a5a5a5a5, m_trans_ticket = 0xa5a5a5a5a5a5a5a5}} Again access to unitialized memory, sv is ok but sv+1 not R: -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Jan Lindström <jplindst@mariadb.org> writes:
The point I do not understant is at ha_savepoint
we have again:
if ((err= ht->savepoint_set(ht, thd, (uchar *)(sv+1)+ht->savepoint_offset)))
(gdb) p *sv $3 = {prev = 0xa5a5a5a5a5a5a5a5, name = 0x7f505c007390 "A", length = 1, ha_list = 0xa5a5a5a5a5a5a5a5, mdl_savepoint = {m_stmt_ticket = 0xa5a5a5a5a5a5a5a5, m_trans_ticket = 0xa5a5a5a5a5a5a5a5}} (gdb) p *(sv+1) $2 = {prev = 0xa5a5a5a5a5a5a5a5, name = 0xa5a5a5a5a5a5a5a5 <Address 0xa5a5a5a5a5a5a5a5 out of bounds>, length = 2779096485, ha_list = 0xa5a5a5a5a5a5a5a5, mdl_savepoint = {m_stmt_ticket = 0xa5a5a5a5a5a5a5a5, m_trans_ticket = 0xa5a5a5a5a5a5a5a5}}
Again access to unitialized memory, sv is ok but sv+1 not
Note that it is not "sv+1". It is ((uchar *)sv+1) + ht->savepoint_offset. I suppose that sv points to a memory block containing first a SAVEPOINT object, and after that some extra bytes allocated per storage engine. Then when SAVEPOINT is executed, we call into each storage engine ht->savepoint_set() to ask it to execute the savepoint and store engine-specific data in the allocated extra bytes at (uchar *)(sv+1)+ht->savepoint_offset. And later when ROLLBACK TO SAVEPOINT is executed, we ask the engine to execute the (partial) rollback using whatever values it put into (uchar *)(sv+1)+ht->savepoint_offset itself before. So it is normal that the data is unitialised in ha_savepoint() - because this is where we are calling into each engine to initialise it. But the bug is that the WSREP patch disables this initialisation. So that when we get to ha_rollback_to_savepoint(), the binlog code gets uninitialised data. If you are wondering about the 0xa5a5a5a5a5a5a5a5 in *sv, isn't the most likely that those fields are just not used anywhere in the code? Disclaimer: This is the first time I'm looking at this code, but it seems clear enough ... - Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
if ((err= ht->savepoint_set(ht, thd, (uchar *)(sv+1)+ht->savepoint_offset)))
Note that it is not "sv+1". It is ((uchar *)sv+1) + ht->savepoint_offset.
Sorry, I messed up that explanation. The point is: sv points to a SAVEPOINT object. But (uchar *)(sv+1) does not point to a SAVEPOINT object. It points to some uchar bytes that happen to reside in memory just after the SAVEPOINT object. Not the clearest way to write the code. So *(sv+1) is not meaningful. - Kristian.
Quoting Kristian Nielsen <knielsen@knielsen-hq.org>:
Jan Lindström <jplindst@mariadb.org> writes:
The point I do not understant is at ha_savepoint
we have again:
if ((err= ht->savepoint_set(ht, thd, (uchar *)(sv+1)+ht->savepoint_offset)))
(gdb) p *sv $3 = {prev = 0xa5a5a5a5a5a5a5a5, name = 0x7f505c007390 "A", length = 1, ha_list = 0xa5a5a5a5a5a5a5a5, mdl_savepoint = {m_stmt_ticket = 0xa5a5a5a5a5a5a5a5, m_trans_ticket = 0xa5a5a5a5a5a5a5a5}} (gdb) p *(sv+1) $2 = {prev = 0xa5a5a5a5a5a5a5a5, name = 0xa5a5a5a5a5a5a5a5 <Address 0xa5a5a5a5a5a5a5a5 out of bounds>, length = 2779096485, ha_list = 0xa5a5a5a5a5a5a5a5, mdl_savepoint = {m_stmt_ticket = 0xa5a5a5a5a5a5a5a5, m_trans_ticket = 0xa5a5a5a5a5a5a5a5}}
Again access to unitialized memory, sv is ok but sv+1 not
Note that it is not "sv+1". It is ((uchar *)sv+1) + ht->savepoint_offset.
I suppose that sv points to a memory block containing first a SAVEPOINT object, and after that some extra bytes allocated per storage engine.
Then when SAVEPOINT is executed, we call into each storage engine ht->savepoint_set() to ask it to execute the savepoint and store engine-specific data in the allocated extra bytes at (uchar *)(sv+1)+ht->savepoint_offset.
And later when ROLLBACK TO SAVEPOINT is executed, we ask the engine to execute the (partial) rollback using whatever values it put into (uchar *)(sv+1)+ht->savepoint_offset itself before.
So it is normal that the data is unitialised in ha_savepoint() - because this is where we are calling into each engine to initialise it.
But the bug is that the WSREP patch disables this initialisation. So that when we get to ha_rollback_to_savepoint(), the binlog code gets uninitialised data.
If you are wondering about the 0xa5a5a5a5a5a5a5a5 in *sv, isn't the most likely that those fields are just not used anywhere in the code?
Disclaimer: This is the first time I'm looking at this code, but it seems clear enough ...
- Kristian.
Jan, why don't you file a bug report about this? Following these emails is getting tedious... Did I understand it right that the problem is with "ROLLBACK TO SAVEPOINT...", where the designated savepoint does not exist? Did you try same test with MGC 5.5? -seppo -- http://www.codership.com seppo.jaakola@codership.com tel: +358 40 510 5938 skype: seppo_jaakola
seppo.jaakola@codership.com writes:
Did I understand it right that the problem is with "ROLLBACK TO SAVEPOINT...", where the designated savepoint does not exist? Did you
No. It is a very simple issue. The beginning of binlog_savepoint_set() reads: #ifdef WITH_WSREP if (wsrep_emulate_bin_log) DBUG_RETURN(0); #endif /* WITH_WSREP */ So binlog_savepoint_set() does nothing in the Galera case. But binlog_savepoint_rollback() reads: #ifdef WITH_WSREP if (!wsrep_emulate_bin_log ... #endif { ... } binlog_trans_log_truncate(thd, *(my_off_t*)sv); Note that the binlog_trans_log_truncate() call is outside the if, so it _is_ executed even in the Galera case. So the WSREP patch tries to disable some binlog savepoint code but does it incorrectly. It disables setting the savepoint but not rolling it back, so the latter gets invalid data. It needs to match up, so that _set() and _rollback() match each other in what they do. This issue is in the maria-10.0-galera tree. I did not check other Galera trees (not sure which trees to check). - Kristian.
Hi, I think I solved this issue by #ifdef WITH_WSREP if (!wsrep_emulate_bin_log) binlog_trans_log_truncate(thd, *(my_off_t*)sv); #endif This avoids using the uninitialized memory in Galera, this was error in merge. R: Jan
seppo.jaakola@codership.com writes:
Did I understand it right that the problem is with "ROLLBACK TO SAVEPOINT...", where the designated savepoint does not exist? Did you No. It is a very simple issue.
The beginning of binlog_savepoint_set() reads:
#ifdef WITH_WSREP if (wsrep_emulate_bin_log) DBUG_RETURN(0); #endif /* WITH_WSREP */
So binlog_savepoint_set() does nothing in the Galera case.
But binlog_savepoint_rollback() reads:
#ifdef WITH_WSREP if (!wsrep_emulate_bin_log ... #endif { ... }
binlog_trans_log_truncate(thd, *(my_off_t*)sv);
Note that the binlog_trans_log_truncate() call is outside the if, so it _is_ executed even in the Galera case.
So the WSREP patch tries to disable some binlog savepoint code but does it incorrectly. It disables setting the savepoint but not rolling it back, so the latter gets invalid data. It needs to match up, so that _set() and _rollback() match each other in what they do.
This issue is in the maria-10.0-galera tree. I did not check other Galera trees (not sure which trees to check).
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
-- -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Seppo, Hi, What means: 30925 15:10:36 [ERROR] Slave SQL: Could not execute Update_rows event on table test.table10_key_pk_parts_2_int_autoinc; Can't find record in 'table10_key_pk_parts_2_int_autoinc', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 275, Internal MariaDB error code: 1032 130925 15:10:36 [Warning] WSREP: RBR event 4 Update_rows apply warning: 120, 136 130925 15:10:36 [ERROR] WSREP: Failed to apply trx: source: dd56a506-25da-11e3-b555-0fdde9a0e702 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 12 trx_id: 2503 seqnos (l: 140, g: 136, s: 135, d: 50, ts: 1380111033191170111) 130925 15:10:36 [ERROR] WSREP: Failed to apply app buffer: seqno: 136, status: WSREP_FATAL at galera/src/replicator_smm.cpp:apply_wscoll():52 at galera/src/replicator_smm.cpp:apply_trx_ws():118 130925 15:10:36 [ERROR] WSREP: Node consistency compromized, aborting... R: -- Jan Lindström Principal Engineer MariaDB | MaxScale | skype: jan_p_lindstrom www.skysql.com <http://www.skysql.com/> Twitter <http://twitter.com/skysql> Blog <http://www.skysql.com/blog/> Facebook <http://www.facebook.com/skysql> LinkedIn <http://www.linkedin.com/company/1214250> Google+ <https://plus.google.com/117544963211695643458/posts>
Quoting Jan Lindström <jplindst@mariadb.org>:
Seppo,
Hi,
What means:
30925 15:10:36 [ERROR] Slave SQL: Could not execute Update_rows event on table test.table10_key_pk_parts_2_int_autoinc; Can't find record in 'table10_key_pk_parts_2_int_autoinc', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 275, Internal MariaDB error code: 1032 130925 15:10:36 [Warning] WSREP: RBR event 4 Update_rows apply warning: 120, 136 130925 15:10:36 [ERROR] WSREP: Failed to apply trx: source: dd56a506-25da-11e3-b555-0fdde9a0e702 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 12 trx_id: 2503 seqnos (l: 140, g: 136, s: 135, d: 50, ts: 1380111033191170111) 130925 15:10:36 [ERROR] WSREP: Failed to apply app buffer: seqno: 136, status: WSREP_FATAL at galera/src/replicator_smm.cpp:apply_wscoll():52 at galera/src/replicator_smm.cpp:apply_trx_ws():118 130925 15:10:36 [ERROR] WSREP: Node consistency compromized, aborting...
R:
This is a message of detected data inconsistency. Replicated update could not be processed because the row to be updated could not be found in this node's database. You will find file GRA_12_136.log in your data directory, and this file contains all binlog events for the offending transaction. Node did emergency shutdown due to the inconsistency. -seppo -- http://www.codership.com seppo.jaakola@codership.com tel: +358 40 510 5938 skype: seppo_jaakola
participants (7)
-
Alex Yurchenko
-
Alexander Barkov
-
Jan Lindström
-
Kristian Nielsen
-
Michael Widenius
-
Roberto Spadim
-
seppo.jaakola@codership.com