Hi Alexander, I've got a question about mis-matched charset comparisons and ref access method. == Short form == I know that VARCHAR comparisons over mis-matching charsets cannot be used for constructing index lookups into the index over the "narrower" character set column. But is this a real limitation or just an optimizer deficiency? If it is the latter, do people hit it in the real world sufficiently often to warrant lifting it? == Long form == === Example dataset === create table t0 ( a varchar(32) character set utf8 collate utf8_general_ci ); create table t1 ( a varchar(32) character set latin1 collate latin1_swedish_ci, col1 varchar(32), key(a) ); create table t2 ( a varchar(32) character set utf8 collate utf8_general_ci, col1 varchar(32), key(a) ); insert into t0 select a from ten; insert into t1 select a,a from one_k; insert into t2 select a,a from one_k; === Queries === MariaDB [test]> explain -> select * from t0, t2 where t0.a=t2.a; +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 1 | SIMPLE | t2 | ref | a | a | 99 | test.t0.a | 1 | | +------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+ Ok, good. MariaDB [test]> explain -> select * from t0, t1 where t0.a=t1.a; +------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+ | 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where; Using join buffer (flat, BNL join) | +------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+ 2 rows in set (0.00 sec) Oops, the index t1(a) is not used. FORMAT=JSON shows the reason: "attached_condition": "(t0.a = convert(t1.a using utf8))" Still it feels like the optimizer could * Try to convert the value of t0.a into latin1. * If it can't be represented in latin1, then we know that no row in t1 has t1.a=t0.a. * make a lookup on t1.a= convert(t0.a using latin1) , using t1.a's collation rules. Is this true [for some charsets] ? If yes, is it worth doing? BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog