hi guys!
one question, as a mariadb user...
base64 will be exposed to sql layer? could i use TO_BASE64() and FROM_BASE64() at mysql client?
there's a MDEV in JIRA about it, if you could attach it to this patch could be nice: MDEV-4387

and with time, implement base32 and base16: MDEV-4800

base64 is very usefull, and many php users use it to "send/receive" files from database
base32 from what i know is usefull only for OTP and other password apps
base16 is a hexadecimal function HEX() and UNHEX()

thanks guys


2013/9/17 Alexander Barkov <bar@mariadb.org>
Hi Monty,

thanks for review.

I have addressed most of your suggestions. See the new version attached,
and the detailed comments inline:



On 09/12/2013 04:32 PM, Michael Widenius wrote:

Hi!

Here is the review for the code that we should put into 10.0

First the base64:

=== modified file 'mysql-test/t/func_str.test'
--- mysql-test/t/func_str.test  2013-05-07 11:05:09 +0000
+++ mysql-test/t/func_str.test  2013-08-28 13:14:24 +0000
@@ -1555,3 +1555,118 @@ drop table t1,t2;
  --echo # End of 5.5 tests
  --echo #

+
+--echo #
+--echo # Start of 5.6 tests
+--echo #

Shouldn't this be start of 10.0 tests ?
(I know that this code is from MySQL 5.6, but still for us this is 10.0...)

This code is (almost) copy-and-paste from MySQL-5.6.
I think it's a good idea when looking inside a test file
to be able to see which tests are coming from MySQL-5.6,
and which tests are coming from MariaDB-10.0.

I'd suggest to keep "Start of 5.6 tests" in this particular case,
and also when merging the tests for the other merged MySQL-5.6 features.




=== modified file 'mysys/base64.c'
--- mysys/base64.c      2011-06-30 15:46:53 +0000
+++ mysys/base64.c      2013-03-09 06:22:59 +0000
@@ -1,5 +1,4 @@
-/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc.
-   Use is subject to license terms.
+/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.

Removed 'all rights reserved'. You can't have that for GPL code.

If there is any new code from us, please add a copyright message for the
MariaDB foundation too!

Removed 'all rights reserved' and added "MariaDB", as there are some
our own changes.




     This program is free software; you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
@@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ
                               "abcdefghijklmnopqrstuvwxyz"
                               "0123456789+/";

+/**
+ * Maximum length base64_needed_encoded_length()
+ * can handle without overflow.
+ */
+int
+base64_encode_max_arg_length()
+{
+#if (SIZEOF_INT == 8)
+  /*
+    6827690988321067803 ->   9223372036854775805
+    6827690988321067804 ->  -9223372036854775807
+  */

Please describe from where the following number comes from
(so that anyone can recalculate and check the number if needed):

I added more comments where these numbers come from.



+  return 0x5EC0D4C77B03531BLL
+#else
+  /*
+    1589695686 ->  2147483646
+    1589695687 -> -2147483645
+  */
+  return 0x5EC0D4C6;
+#endif
+}
+

  int
  base64_needed_encoded_length(int length_of_data)
@@ -39,6 +60,21 @@ base64_needed_encoded_length(int length_
  }


+/**
+ * Maximum length base64_needed_decoded_length()
+ * can handle without overflow.
+ */
+int
+base64_decode_max_arg_length()
+{

The following also need a comment.

Also added comments.



+#if (SIZEOF_INT == 8)
+  return 0x2AAAAAAAAAAAAAAALL;
+#else
+  return 0x2AAAAAAA;
+#endif
+}
+
+

<cut>

+/**
+ * Get the next character from a base64 encoded stream.
+ * Skip spaces, then scan the next base64 character or a pad character
+ * and collect bits into decoder->c.
+ *
+ * @param  decoder  Pointer to MY_BASE64_DECODER
+ * @return
+ *  FALSE on success (a valid base64 encoding character found)
+ *  TRUE  on error (unexpected character or unexpected end-of-input found)
+ */

Add an empty line here.

Done.



It would also be good to have a good description of the base64 format
in this file so that one can understand what the functions are
supposed to do (or at least a link to the specification).

Some thing I figured out by asking bar on IRC:

- We never generate spaces
- When decoding, we allow spaces anywhere.
- The end of an 64 base  decoded string is always padded with '='
   so that the final length is dividable with 4.

I added a reference to http://en.wikipedia.org/wiki/Base64,
as well all added these your comments in proper places of the code.




+static inline my_bool
+my_base64_decoder_getch(MY_BASE64_DECODER *decoder)

Remove inline from the above; It's likely to make the code slower, not
faster as this is a big function with lot of if's.

Removed "inline".



  {
-  char b[3];
-  size_t i= 0;
-  char *dst_base= (char *)dst;
-  char const *src= src_base;
-  char *d= dst_base;
-  size_t j;
+  if (my_base64_decoder_skip_spaces(decoder))
+    return TRUE; /* End-of-input */

-  while (i < len)
+  if (!my_base64_add(decoder)) /* Valid base64 character found */
    {
-    unsigned c= 0;
-    size_t mark= 0;
+    if (decoder->mark)
+    {
+      /* If we have scanned '=' already, then only '=' is valid */
+      DBUG_ASSERT(decoder->state == 3);
+      decoder->error= 1;
+      decoder->src--;
+      return TRUE; /* expected '=', but encoding character found */
+    }
+    decoder->state++;
+    return FALSE;
+  }

If you want to have the funtion inline, then move the following to
another not inline function;  We don't need to generate code for this
for every function call:

+
+  /* Process error */
+  switch (decoder->state)
+  {
+  case 0:
+  case 1:
+    decoder->src--;
+    return TRUE; /* base64 character expected */
+    break;
+
+  case 2:
+  case 3:
+    if (decoder->src[-1] == '=')
+    {
+      decoder->error= 0; /* Not an error - it's a pad character */
+      decoder->mark++;
+    }
+    else
+    {
+      decoder->src--;
+      return TRUE; /* base64 character or '=' expected */
+    }
+    break;
+  default:
+    DBUG_ASSERT(0);
+    return TRUE; /* Wrong state, should not happen */
+  }

<cut>

Add to the comment for base64_decode that the 'dst' buffer has to be
at least 2 character longer than what is needed to decode src_base.

I added this comment:

* Note: 'dst' must have sufficient space to store the decoded data.
* Use base64_needed_decoded_length() to calculate the correct space size.






+int
+base64_decode(const char *src_base, size_t len,
+              void *dst, const char **end_ptr, int flags)
+{
+    if (my_base64_decoder_getch(&decoder) ||
+        my_base64_decoder_getch(&decoder) ||
+        my_base64_decoder_getch(&decoder) ||
+        my_base64_decoder_getch(&decoder))
+      break;

Generating the error handling with a switch for every of the
above is wasteful. See previous comment.

Right. my_base64_decoder_getch() is not inline anymore.



+  /* Return error if there are more non-space characters */
+  decoder.state= 0;
+  if (!my_base64_decoder_skip_spaces(&decoder))
+    decoder.error= 1;
+
    if (end_ptr != NULL)

Note that base64_needed_decoded_length() used ceil() when it's not needed.

Removed.




<cut>

=== modified file 'sql/item_strfunc.cc'
@@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_length_a
     set_persist_maybe_null(1);
  }

+
+void Item_func_to_base64::fix_length_and_dec()
+{
+  maybe_null= args[0]->maybe_null;
+  collation.set(default_charset(), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);

Maybe better to cast both arguments to (ulonglong) as the rest of the code
is depending on this.

Also, why is base64_encode_max_arg_length() int instead of uint or
even better size_t?

This is because base64_encode() and base64_decode() are used
in the replication code using "int" as return value.

When exposing TO_BASE64() and FROM_BASE64() to the SQL level,
I did not want to touch the replication code.



+  if (args[0]->max_length > (uint) base64_encode_max_arg_length())
+  {
+    maybe_null= 1;
+    fix_char_length_ulonglong((ulonglong) base64_encode_max_arg_length());
+  }
+  else
+  {
+    int length= base64_needed_encoded_length((int) args[0]->max_length);
+    DBUG_ASSERT(length > 0);

Don't think assert is needed as the function gurantees it already.

+    fix_char_length_ulonglong((ulonglong) length - 1);
+  }
+}

<cut>

+String *Item_func_from_base64::val_str(String *str)
+{
+  String *res= args[0]->val_str_ascii(str);
+  bool too_long= false;
+  int length;
+  const char *end_ptr;
+
+  if (!res ||
+      res->length() > (uint) base64_decode_max_arg_length() ||
+      (too_long=
+       ((uint) (length= base64_needed_decoded_length((int) res->length())) >
+        current_thd->variables.max_allowed_packet)) ||
+      tmp_value.alloc((uint) length) ||
+      (length= base64_decode(res->ptr(), (int) res->length(),
+                             (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 ||
+      end_ptr < res->ptr() + res->length())
+  {

Shouldn't we get a readable error or warning if the string contained wrong
characters?
Now it looks like we will just return null.

Something like 'Malformed base64 string. Error at position %d" would
be nice.

Good idea. I have added a warning. Now it looks clear what went wrong
if FROM_BASE64() returns NULL.





+    null_value= 1; // NULL input, too long input, OOM, or badly formed input
+    if (too_long)
+    {
+      push_warning_printf(current_thd, Sql_condition::WARN_LEVEL_WARN,
+                          ER_WARN_ALLOWED_PACKET_OVERFLOWED,
+                          ER(ER_WARN_ALLOWED_PACKET_OVERFLOWED), func_name(),
+                          current_thd->variables.max_allowed_packet);
+    }
+    return 0;
+  }
+  tmp_value.length((uint) length);
+  null_value= 0;
+  return &tmp_value;
+}

<cut>

<cut>


_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp




--
Roberto Spadim
SPAEmpresarial