Hi, Alexander! On Jun 16, Alexander Barkov wrote:
Agreed. This version fixes around 16 classes that had the same problem, and which only needed to swap "str" and "tmp_value" in their val_str() implementations.
As agreed on slack, the following three classes: - Item_str_conv - Item_func_make_set - Item_char_typecast with more complex implementations will be fixed separately. I'll create separate MDEVs for these three classes.
Right, thanks!
And, by the way, could you also fix the comment in item.h around Item::val_str(String *) not to say that it can only change allocated strings? May be it should say about const (String::make_const) strings instead. Thanks. Also you could add something about this your CONCAT fix. Like, the item can use internal buffer as a temporary storage, but the result should be in the String argument, not the other way around.
Can you please suggest proper comments and proper places for them? I got stuck with that :)
Let's see. Old comment: ============================ NOTE Buffer passed via argument should only be used if the item itself doesn't have an own String buffer. In case when the item maintains it's own string buffer, it's preferable to return it instead to minimize number of mallocs/memcpys. The caller of this method can modify returned string, but only in case when it was allocated on heap, (is_alloced() is true). This allows the caller to efficiently use a buffer allocated by a child without having to allocate a buffer of it's own. The buffer, given to val_str() as argument, belongs to the caller and is later used by the caller at it's own choosing. A few implications from the above: - unless you return a string object which only points to your buffer but doesn't manages it you should be ready that it will be modified. - even for not allocated strings (is_alloced() == false) the caller can change charset (see Item_func_{typecast/binary}. XXX: is this a bug? - still you should try to minimize data copying and return internal object whenever possible. ============================ New comment suggestion: ============================ NOTE The caller can modify the returned String, if it's not marked "const" (with the String::mark_as_const() method). That means that if the item returns its own internal buffer (e.g. tmp_value), it *must* be marked "const" [1]. So normally it's preferrable to return the result value in the String, that was passed as an argument. But, for example, SUBSTR() returns a String that simply points into the buffer of SUBSTR()'s args[0]->val_str(). Such a String is always "const", so it's ok to use tmp_value for that and avoid reallocating/copying of the argument String. [1] consider SELECT CONCAT(f, ":", f) FROM (SELECT func() AS f); here the return value of f() is used twice in the top-level select, and if they share the same tmp_value buffer, modifying the first one will implicitly modify the second too. ============================
commit cec33b1026c54bd1e0e92d26053843b3bced7c81 Author: Alexander Barkov <bar@mariadb.org> Date: Fri Jun 16 12:58:01 2017 +0400
MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
The bug happens because of a combination of unfortunate circumstances:
1. Arguments args[0] and args[2] of Item_func_concat point recursively (through Item_direct_view_ref's) to the same Item_func_conv_charset. Both args[0]->args[0]->ref[0] and args[2]->args[0]->ref[0] refer to this Item_func_conv_charset.
2. When Item_func_concat::args[0]->val_str() is called, Item_func_conv_charset::val_str() writes its result to Item_func_conc_charset::tmp_value.
3. Then, for optimization purposes (to avoid copying), Item_func_substr::val_str() initializes Item_func_substr::tmp_value to point to the buffer fragment owned by Item_func_conv_charset::tmp_value Item_func_substr::tmp_value is returned as a result of Item_func_concat::args[0]->val_str().
4. Due to optimization to avoid memory reallocs, Item_func_concat::val_str() remembers the result of args[0]->val_str() in "res" and further uses "res" to collect the return value.
5. When Item_func_concat::args[2]->val_str() is called, Item_func_conv_charset::tmp_value gets overwritten (see #1), which effectively overwrites args[0]'s Item_func_substr::tmp_value (see #3), which effectively overwrites "res" (see #4).
This patch does the following:
a. Changes Item_func_conv_charset::val_str(String *str) to use tmp_value and str the other way around. After this change tmp_value is used to store a temporary result, while str is used to return the value. The fixes the second problem (without SUBSTR): SELECT CONCAT(t2,'-',t2) c2 FROM (SELECT CONVERT(t USING latin1) t2 FROM t1) sub; As Item_func_concat::val_str() supplies two different buffers when calling args[0]->val_str() and args[2]->val_str(), in the new reduction the result created during args[0]->val_str() does not get overwritten by args[2]->val_str().
b. Fixing the same problem in val_str() for similar classes
Item_func_to_base64 Item_func_from_base64 Item_func_weight_string Item_func_hex Item_func_unhex Item_func_quote Item_func_compress Item_func_uncompress Item_func_des_encrypt Item_func_des_decrypt Item_func_conv_charset Item_func_reverse Item_func_soundex Item_func_aes_encrypt Item_func_aes_decrypt Item_func_buffer
c. Fixing Item_func::val_str_from_val_str_ascii() the same way. Now Item_str_ascii_func::ascii_buff is used for temporary value, while the parameter passed to val_str() is used to return the result. This fixes the same problem when conversion (from ASCII to e.g. UCS2) takes place. See the ctype_ucs.test for example queries that returned wrong results before the fix.
d. Some Item_func descendand classes had temporary String buffers (tmp_value and tmp_str), but did not really use them. Removing these temporary buffers from:
Item_func_decode_histogram Item_func_format Item_func_binlog_gtid_pos Item_func_spatial_collection:
e. Removing Item_func_buffer::tmp_value, because it's not used any more.
f. Renaming Item_func_[un]compress::buffer to "tmp_value", for consistency with other classes.
Note, this patch does not fix the following classes (although they have a similar problem):
Item_str_conv Item_func_make_set Item_char_typecast
They have a complex implementations and simple swapping between "tmp_value" and "str" won't work. These classes will be fixed separately.
Looks ok, thanks! Regards, Sergei Chief Architect MariaDB and security@mariadb.org