On Mon, Feb 21, 2011 at 03:20:13PM -0800, Igor Babaev wrote:
=== modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2011-02-06 04:57:03 +0000 +++ b/sql/sql_select.cc 2011-02-12 04:41:22 +0000 ... @@ -8419,6 +8438,11 @@ sort_by_tab->type= JT_ALL; sort_by_tab->read_first_record= join_init_read_record; } + else if (sort_by_tab->type == JT_HASH_NEXT) + { + sort_by_tab->type= JT_HASH; + sort_by_tab->read_first_record= join_init_read_record; + } } break; } I have a question only to this part of the patch: I don't understand how the above code could work.
The if statement above the one you've added checks if sort_by_tab was using [ordered] index scan to produce records in orderer, and if yes, switches it to a full scan (because use of join buffering will break ordering anyway).
I suppose the part you've added does something similar, but I totally fail to understand what exactly it does.
The above code says: if hash join is used do not use full index scan to look through the joined table, rather use full table scan in this case.
I've put a DBUG_ASSERT(0) right after the "sort_by_tab->type= JT_HASH;" line and ran the testsuite. The assertion didn't fire. I did this because I suspect (although can't provide a sound proof right away) that the part inside the else if () { ...} is deadcode. Do you think you could easily prove me wrong by giving a testcase? If constructing testcase is complicated, let's meet online and discuss my suspicions about the deadcode.
Now, some comments on the new EXPLAIN output:
ISSUE1.
Consider a query: MariaDB [j1]> explain select * from t3, t4 where t3.col1=t4.col2; +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ | 1 | SIMPLE | t3 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 1 | SIMPLE | t4 | hash_ALL | NULL | #hash#$hj | 5 | j1.t3.col1 | 1000 | Using where; Using join buffer (flat, BNLH join) | +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ 2 rows in set (0.00 sec)
AFAIU #hash#$hj means the hash key on the hashtable that's used for hash join. I think the name is too convoluted. The hashtable has only one key, if we need to tell it from t4's indexes it would be sufficient to use "#hashkey#", without the cryptic "$hj".
As for "#" characters: EXPLAIN already shows "internal" tables/indexes:
If you look into #maria-call log you'll see that I followed strictly Monty's instructions: to use the prefix #hash# before the index name I used to use, i.e. <used_index> should be substituted for #hash#<used_index>.
The only deviation I afforded myself was: I used $hj instead of hj_key.
Ok. I still think that my objections are valid and current EXPLAIN output is not the best but we could discuss/address this outside the scope of this bugfix. <skip>
What is the index that's used for scanning the table t1? "#hash#cu1" shows that hash join optimizer used access on index "cu1" in its analysis, but generally it doesn't mean that full index scan on table t1 was done on index cu1, doesn't it? I would very much prefer to see #hash#cu1:USED_INDEX there (or taking into account previous suggestions, hashkey:USED_INDEX). It will also be consistent with how hash_range and hash_index_merge use that column.
Yes, the scan index is missing here. I will add.
OK.
ISSUE4 If key and key_len use hash_part:real_part notation, i.e. use semicolon, why use underscore in hash_ALL, hash_index, etc? Won't hash:ALL, hash:index look more consistent with other columns?
again: I follow here Monty's recommendations. Nothing more, nor less.
BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog