Re: [Maria-developers] On constructing ref access from mis-matched charset comparisons

24 Feb 2017

      Hi, Sergey!

On Feb 24, Sergey Petrunia wrote:
...
== Long form ==
=== Example dataset ===
create table t0 (
  a varchar(32) character set utf8 collate utf8_general_ci
);
create table t1 (
  a varchar(32) character set latin1 collate latin1_swedish_ci,
  col1 varchar(32),
  key(a)
);
insert into t0 select a from ten;
insert into t1 select a,a from one_k;
=== Queries ===
MariaDB [test]> explain 
    -> select * from t0, t1 where t0.a=t1.a;
+------+-------------+-------+------+---------------+------+---------+------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  |
+------+-------------+-------+------+---------------+------+---------+------+
|    1 | SIMPLE      | t0    | ALL  | NULL          | NULL | NULL    | NULL |
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |
+------+-------------+-------+------+---------------+------+---------+------+
2 rows in set (0.00 sec)
Oops, the index t1(a) is not used.  Still it feels like the optimizer could
* Try to convert the value of t0.a into latin1.  
* If it can't be represented in latin1, then we know that no row
  in t1 has t1.a=t0.a.
* make a lookup on t1.a= convert(t0.a using latin1) , using t1.a's collation
  rules.
Is this true [for some charsets] ?
I suppose it's true for *some charsets*, it's not true generally.
It is certainly not true that if you can convert, you can compare. For
example, let's compare t1.a with a latin1_german2_ci literal, for
example, "ö". You can convert "ö" to latin1_swedish_ci. But

  t1.a= "ö" collate latin1_german2_ci

is not the same as

  t1.a= "ö" collate latin1_swedish_ci

In the first case you'll find 'oe', in the second - you won't.

Regards,
Sergei
Chief Architect MariaDB
and security@mariadb.org

Re: [Maria-developers] On constructing ref access from mis-matched charset comparisons

Sergei Golubchik