[Maria-developers] MDEV-13118 Wrong results with LOWER and UPPER and subquery

newer
[Maria-developers] To MDEV-6608...

older
[Maria-developers] Selecting Blob...

Alexander Barkov

20 Jun 2017 20 Jun '17

10:45 a.m.

Hello Sergei, Please review a patch for MDEV-13118. Thanks!

Show replies by thread

Alexander Barkov

22 Jul 22 Jul

7:38 a.m.

New subject: [Maria-developers] MDEV-13118 Wrong results with LOWER and UPPER and subquery

Hello Sergei, I sent a broken patch last time. Sending the patch again. Thanks! On 06/20/2017 12:45 PM, Alexander Barkov wrote:

...

Hello Sergei,

Please review a patch for MDEV-13118.

Thanks!

Sergei Golubchik

24 Oct 24 Oct

11:09 a.m.

New subject: [Maria-developers] MDEV-13118 Wrong results with LOWER and UPPER and subquery

Hi, Alexander! On Jul 22, Alexander Barkov wrote:

...

On 06/20/2017 12:45 PM, Alexander Barkov wrote:

...
Hello Sergei,

Please review a patch for MDEV-13118.

Thanks!

...

commit dc7352fa2bc21fd9b66b12ec33cdacf88b478157 Author: Alexander Barkov Date: Tue Jun 20 12:44:36 2017 +0400

MDEV-13118 Wrong results with LOWER and UPPER and subquery

This problem is similar to MDEV-10306.

1. Fixing Item_str_conv::val_str(String *str) to return the result in "str", and to use tmp_value only as a temporary buffer for args[0]->val_str(). The new code version now guarantees that the result is always returned in "str". The trick with copy_if_not_alloced() is not used any more.

2. Removing the optimization that *some* character sets used in casedn() and caseup(), which allowed (and required) to change the case in-place, overwriting the string passed as the "src" argument. Now all CHARSET_INFO's work in the same way: non of them change the source string in-place, all of them now convert case from the source string to the destination string, leaving the source string untouched.

I don't understand why it was needed. If the goal is to return the result in 'str' and never in 'tmp_value', it could've been achieved without doing 1. or 2., something like diff --git a/sql/item_strfunc.cc b/sql/item_strfunc.cc index 89e55234482..f1837a4477b 100644 --- a/sql/item_strfunc.cc +++ b/sql/item_strfunc.cc @@ -1560,30 +1560,28 @@ String *Item_str_conv::val_str(String *str) { DBUG_ASSERT(fixed == 1); String *res; - if (!(res=args[0]->val_str(str))) - { - null_value=1; /* purecov: inspected */ - return 0; /* purecov: inspected */ - } - null_value=0; if (multiply == 1) { uint len; - res= copy_if_not_alloced(&tmp_value, res, res->length()); + if ((null_value= !(res= args[0]->val_str(str)))) + return 0; + res= copy_if_not_alloced(str, res, res->length()); len= converter(collation.collation, (char*) res->ptr(), res->length(), (char*) res->ptr(), res->length()); DBUG_ASSERT(len <= res->length()); res->length(len); } else { + if ((null_value= !(res= args[0]->val_str(&tmp_value)))) + return 0; uint len= res->length() * multiply; - tmp_value.alloc(len); - tmp_value.set_charset(collation.collation); + str->alloc(len); + str->set_charset(collation.collation); len= converter(collation.collation, (char*) res->ptr(), res->length(), - (char*) tmp_value.ptr(), len); - tmp_value.length(len); - res= &tmp_value; + (char*) str->ptr(), len); + str->length(len); + res= str; } return res; } Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Alexander Barkov

2 Nov 2 Nov

2:26 p.m.

New subject: [Maria-developers] MDEV-13118 Wrong results with LOWER and UPPER and subquery

Hello Sergei, Thanks for your reviews. Please find my comments below. On 10/24/2017 01:09 PM, Sergei Golubchik wrote:

...

Hi, Alexander!

On Jul 22, Alexander Barkov wrote:

...
On 06/20/2017 12:45 PM, Alexander Barkov wrote:

...
Hello Sergei,

Please review a patch for MDEV-13118.

Thanks!

...
commit dc7352fa2bc21fd9b66b12ec33cdacf88b478157 Author: Alexander Barkov Date: Tue Jun 20 12:44:36 2017 +0400

MDEV-13118 Wrong results with LOWER and UPPER and subquery

This problem is similar to MDEV-10306.

1. Fixing Item_str_conv::val_str(String *str) to return the result in "str", and to use tmp_value only as a temporary buffer for args[0]->val_str(). The new code version now guarantees that the result is always returned in "str". The trick with copy_if_not_alloced() is not used any more.

2. Removing the optimization that *some* character sets used in casedn() and caseup(), which allowed (and required) to change the case in-place, overwriting the string passed as the "src" argument. Now all CHARSET_INFO's work in the same way: non of them change the source string in-place, all of them now convert case from the source string to the destination string, leaving the source string untouched.

I don't understand why it was needed. If the goal is to return the result in 'str' and never in 'tmp_value', it could've been achieved without doing 1. or 2., something like

There was a symmetry in the charset library in caseup() and casedn() implementations for various charsets: - charsets whose case conversion does not change octet length implemented "inplace" conversion. - charsets whose case conversion change octet length implemented "copying" conversion. That was done specifically for Item_str_conv. Now with Item_str_conv returning the value in "str" anyway: - This optimization is not needed any more. There is no sense to do case conversion inplace and then immediately copy the modified string to another String (or the other way around: copy to a new String and then convert inplace). - It's good to make the charset library symmetric for all charsets

...

diff --git a/sql/item_strfunc.cc b/sql/item_strfunc.cc index 89e55234482..f1837a4477b 100644 --- a/sql/item_strfunc.cc +++ b/sql/item_strfunc.cc @@ -1560,30 +1560,28 @@ String *Item_str_conv::val_str(String *str) { DBUG_ASSERT(fixed == 1); String *res; - if (!(res=args[0]->val_str(str))) - { - null_value=1; /* purecov: inspected */ - return 0; /* purecov: inspected */ - } - null_value=0; if (multiply == 1) { uint len; - res= copy_if_not_alloced(&tmp_value, res, res->length()); + if ((null_value= !(res= args[0]->val_str(str)))) + return 0; + res= copy_if_not_alloced(str, res, res->length()); len= converter(collation.collation, (char*) res->ptr(), res->length(), (char*) res->ptr(), res->length()); DBUG_ASSERT(len <= res->length()); res->length(len); } else { + if ((null_value= !(res= args[0]->val_str(&tmp_value)))) + return 0; uint len= res->length() * multiply; - tmp_value.alloc(len); - tmp_value.set_charset(collation.collation); + str->alloc(len); + str->set_charset(collation.collation); len= converter(collation.collation, (char*) res->ptr(), res->length(), - (char*) tmp_value.ptr(), len); - tmp_value.length(len); - res= &tmp_value; + (char*) str->ptr(), len); + str->length(len); + res= str; } return res; }

Regards, Sergei Chief Architect MariaDB and security@mariadb.org

Sergei Golubchik

12 Jul 12 Jul

2:13 p.m.

New subject: [Maria-developers] MDEV-13118 Wrong results with LOWER and UPPER and subquery

Hi, Alexander! On Nov 02, Alexander Barkov wrote:

...

...
...
commit dc7352fa2bc21fd9b66b12ec33cdacf88b478157 Author: Alexander Barkov Date: Tue Jun 20 12:44:36 2017 +0400

MDEV-13118 Wrong results with LOWER and UPPER and subquery

This problem is similar to MDEV-10306.

1. Fixing Item_str_conv::val_str(String *str) to return the result in "str", 2. Removing the optimization that *some* character sets used in casedn() and caseup(), which allowed (and required) to change the case in-place,

I don't understand why it was needed. If the goal is to return the result in 'str' and never in 'tmp_value', it could've been achieved without doing 1. or 2., something like

There was a symmetry in the charset library in caseup() and casedn() implementations for various charsets: That was done specifically for Item_str_conv. Now with Item_str_conv returning the value in "str" anyway:

- This optimization is not needed any more. There is no sense to do case conversion inplace and then immediately copy the modified string to another String (or the other way around: copy to a new String and then convert inplace).

- It's good to make the charset library symmetric for all charsets

I see. Could you split this commit in two, then? Should be as easy as using 'git citool' to move all strings/* and include/* changes into a separate changeset with the comment like Simplify caseup() and casedn() in charsets After the MDEV-13118 fix there's no code in the server that wants caseup/casedn to change the argument in place for simple charsets. Let's remove this logic and always return the result in a new string for all charsets, both simple and complex. Regards, Sergei Chief Architect MariaDB and security@mariadb.org

2207

Age (days ago)

2594

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

Alexander Barkov
Sergei Golubchik