Hi, Sergey! On Feb 24, Sergey Petrunia wrote:
== Long form ==
=== Example dataset ===
create table t0 ( a varchar(32) character set utf8 collate utf8_general_ci );
create table t1 ( a varchar(32) character set latin1 collate latin1_swedish_ci, col1 varchar(32), key(a) );
insert into t0 select a from ten; insert into t1 select a,a from one_k;
=== Queries ===
MariaDB [test]> explain -> select * from t0, t1 where t0.a=t1.a; +------+-------------+-------+------+---------------+------+---------+------+ | id | select_type | table | type | possible_keys | key | key_len | ref | +------+-------------+-------+------+---------------+------+---------+------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | +------+-------------+-------+------+---------------+------+---------+------+ 2 rows in set (0.00 sec)
Oops, the index t1(a) is not used. Still it feels like the optimizer could
* Try to convert the value of t0.a into latin1. * If it can't be represented in latin1, then we know that no row in t1 has t1.a=t0.a. * make a lookup on t1.a= convert(t0.a using latin1) , using t1.a's collation rules.
Is this true [for some charsets] ?
I suppose it's true for *some charsets*, it's not true generally. It is certainly not true that if you can convert, you can compare. For example, let's compare t1.a with a latin1_german2_ci literal, for example, "ö". You can convert "ö" to latin1_swedish_ci. But t1.a= "ö" collate latin1_german2_ci is not the same as t1.a= "ö" collate latin1_swedish_ci In the first case you'll find 'oe', in the second - you won't. Regards, Sergei Chief Architect MariaDB and security@mariadb.org