Re: [Maria-developers] Fwd: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/

On Fri, Feb 11, 2011 at 08:49:37PM -0800, Igor Babaev wrote:
Hi Sergey,
Please review this patch that fixes a defect of the current implementation of the mwl #128 that never couples hash join with range/index_merge scans. The patch also adds prefix #hash# to the names of the used hash indexes. (we agreed upon this at the last optimizer meeting).
Regards, Igor.
-------- Original Message -------- Subject: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/ Date: Fri, 11 Feb 2011 20:41:23 -0800 (PST) From: Igor Babaev <igor@askmonty.org> Reply-To: maria-developers@lists.launchpad.net To: <commits@mariadb.org>
At file:///home/igor/maria/maria-5.3-mwl128-hashrange/
------------------------------------------------------------ revno: 2901 revision-id: igor@askmonty.org-20110212044122-w9n3jdk3d2ps0w3o parent: igor@askmonty.org-20110207231903-3rqbfs50d33lk3r9 committer: Igor Babaev <igor@askmonty.org> branch nick: maria-5.3-mwl128-hashrange timestamp: Fri 2011-02-11 20:41:22 -0800 message: BNLH algorithm always used a full table scan over the joined table even in the cases when there existed range/index-merge scans that were cheaper than the full table scan. This was a defect/bug of the implementation of mwl #128. Now hash join can work not only with full table scan of the joined table, but also with full index scan, range and index-merge scans. Accordingly, in the cases when hash join is used the column 'type' in the EXPLAINs can contain now 'hash_ALL', 'hash_index', 'hash_range' and 'hash_index_merge'. If hash join is coupled with a range/index_merge scan then the columns 'key' and 'key_len' contain info not only on the used hash index, but also on the indexes used for the scan. ... === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2011-02-06 04:57:03 +0000 +++ b/sql/sql_select.cc 2011-02-12 04:41:22 +0000 ... @@ -8419,6 +8438,11 @@ sort_by_tab->type= JT_ALL; sort_by_tab->read_first_record= join_init_read_record; } + else if (sort_by_tab->type == JT_HASH_NEXT) + { + sort_by_tab->type= JT_HASH; + sort_by_tab->read_first_record= join_init_read_record; + } } break; } I have a question only to this part of the patch: I don't understand how the above code could work.
The if statement above the one you've added checks if sort_by_tab was using [ordered] index scan to produce records in orderer, and if yes, switches it to a full scan (because use of join buffering will break ordering anyway). I suppose the part you've added does something similar, but I totally fail to understand what exactly it does. Now, some comments on the new EXPLAIN output: ISSUE1. Consider a query: MariaDB [j1]> explain select * from t3, t4 where t3.col1=t4.col2; +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ | 1 | SIMPLE | t3 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 1 | SIMPLE | t4 | hash_ALL | NULL | #hash#$hj | 5 | j1.t3.col1 | 1000 | Using where; Using join buffer (flat, BNLH join) | +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ 2 rows in set (0.00 sec) AFAIU #hash#$hj means the hash key on the hashtable that's used for hash join. I think the name is too convoluted. The hashtable has only one key, if we need to tell it from t4's indexes it would be sufficient to use "#hashkey#", without the cryptic "$hj". As for "#" characters: EXPLAIN already shows "internal" tables/indexes: MariaDB [j1]> explain select * from (select * from t1) X; +----+-------------+------------+------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------+------+---------+------+------+-------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1000 | | | 2 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | | +----+-------------+------------+------+---------------+------+---------+------+------+-------+ # The following is from MWL#90 tree: MariaDB [test]> explain select * from t10 where a in (select max(b) from t10 group by a); +----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+ | 1 | PRIMARY | t10 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 5 | test.t10.a | 1 | | | 2 | SUBQUERY | t10 | ALL | NULL | NULL | NULL | NULL | 1000 | Using temporary; Using filesort | +----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+ 3 rows in set (0.01 sec) # The following is from current 5.3: MariaDB [j1]> explain select distinct * from t1 where a in (select b from t2); +----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using temporary | | 1 | PRIMARY | subselect2 | eq_ref | unique_key | unique_key | 5 | func | 1 | | | 2 | SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 1000 | Distinct | +----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+ ... and it either uses no quoting at all (see "unique_key"), or angle brackets quoting (see <subquery2>). I think we should not multiply the ways of quoting, so please consider switching to <quoting> or to not using any quoting at all. ISSUE2 MariaDB [j1]> EXPLAIN -> SELECT City.Name, Country.Name, CountryLanguage.Language -> FROM City,Country,CountryLanguage -> WHERE City.Country=Country.Code AND -> CountryLanguage.Country=Country.Code AND -> City.Name LIKE 'L%' AND Country.Population > 3000000 AND -> CountryLanguage.Percentage > 50 AND -> LENGTH(Language) < LENGTH(City.Name) - 2; +----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+ | 1 | SIMPLE | CountryLanguage | ALL | PRIMARY,Percentage | NULL | NULL | NULL | 984 | Using where | | 1 | SIMPLE | Country | hash_ALL | PRIMARY | #hash#PRIMARY | 3 | j1.CountryLanguage.Country | 239 | Using where; Using join buffer (flat, BNLH join) | | 1 | SIMPLE | City | hash_ALL | Country | #hash#Country | 3 | j1.CountryLanguage.Country | 4079 | Using where; Using join buffer (flat, BNLH join) | +----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+ 3 rows in set (0.01 sec) "key" column shows "#hash#PRIMARY". I suppose, this is because hash join opimizer is internally hooked into ref optimizer, and it used the PRIMARY index. This might be useful for developer, but is this information meaningful for the user? If one changes #hash#PRIMARY to #hash#IDX2 (suppose there is an index called IDX2 which has the same definition as primary key), or to #hash#$hj, will it be a different query plan? I think it should be "<hashkey>" always. ISSUE3 For hash_range, "key" column shows {hashkey}:{range_key}, which allows to determine that which key was used for range access. For hash_index type, "key" column shows only the hash's pseudo-index: MariaDB [j3]> EXPLAIN SELECT t2.i FROM t1,t2 WHERE t1.cu = t2.cl; +----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+ | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 6 | | | 1 | SIMPLE | t1 | hash_index | cu1,cu2 | #hash#cu1 | 33 | func | 10 | Using where; Using join buffer (flat, BNLH join) | +----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+ What is the index that's used for scanning the table t1? "#hash#cu1" shows that hash join optimizer used access on index "cu1" in its analysis, but generally it doesn't mean that full index scan on table t1 was done on index cu1, doesn't it? I would very much prefer to see #hash#cu1:USED_INDEX there (or taking into account previous suggestions, hashkey:USED_INDEX). It will also be consistent with how hash_range and hash_index_merge use that column. ISSUE4 If key and key_len use hash_part:real_part notation, i.e. use semicolon, why use underscore in hash_ALL, hash_index, etc? Won't hash:ALL, hash:index look more consistent with other columns? BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog

On 02/21/2011 12:30 PM, Sergey Petrunya wrote:
On Fri, Feb 11, 2011 at 08:49:37PM -0800, Igor Babaev wrote:
Hi Sergey,
Please review this patch that fixes a defect of the current implementation of the mwl #128 that never couples hash join with range/index_merge scans. The patch also adds prefix #hash# to the names of the used hash indexes. (we agreed upon this at the last optimizer meeting).
Regards, Igor.
-------- Original Message -------- Subject: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/ Date: Fri, 11 Feb 2011 20:41:23 -0800 (PST) From: Igor Babaev <igor@askmonty.org> Reply-To: maria-developers@lists.launchpad.net To: <commits@mariadb.org>
At file:///home/igor/maria/maria-5.3-mwl128-hashrange/
------------------------------------------------------------ revno: 2901 revision-id: igor@askmonty.org-20110212044122-w9n3jdk3d2ps0w3o parent: igor@askmonty.org-20110207231903-3rqbfs50d33lk3r9 committer: Igor Babaev <igor@askmonty.org> branch nick: maria-5.3-mwl128-hashrange timestamp: Fri 2011-02-11 20:41:22 -0800 message: BNLH algorithm always used a full table scan over the joined table even in the cases when there existed range/index-merge scans that were cheaper than the full table scan. This was a defect/bug of the implementation of mwl #128. Now hash join can work not only with full table scan of the joined table, but also with full index scan, range and index-merge scans. Accordingly, in the cases when hash join is used the column 'type' in the EXPLAINs can contain now 'hash_ALL', 'hash_index', 'hash_range' and 'hash_index_merge'. If hash join is coupled with a range/index_merge scan then the columns 'key' and 'key_len' contain info not only on the used hash index, but also on the indexes used for the scan. ... === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2011-02-06 04:57:03 +0000 +++ b/sql/sql_select.cc 2011-02-12 04:41:22 +0000 ... @@ -8419,6 +8438,11 @@ sort_by_tab->type= JT_ALL; sort_by_tab->read_first_record= join_init_read_record; } + else if (sort_by_tab->type == JT_HASH_NEXT) + { + sort_by_tab->type= JT_HASH; + sort_by_tab->read_first_record= join_init_read_record; + } } break; } I have a question only to this part of the patch: I don't understand how the above code could work.
The if statement above the one you've added checks if sort_by_tab was using [ordered] index scan to produce records in orderer, and if yes, switches it to a full scan (because use of join buffering will break ordering anyway).
I suppose the part you've added does something similar, but I totally fail to understand what exactly it does.
The above code says: if hash join is used do not use full index scan to look through the joined table, rather use full table scan in this case.
Now, some comments on the new EXPLAIN output:
ISSUE1.
Consider a query: MariaDB [j1]> explain select * from t3, t4 where t3.col1=t4.col2; +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ | 1 | SIMPLE | t3 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 1 | SIMPLE | t4 | hash_ALL | NULL | #hash#$hj | 5 | j1.t3.col1 | 1000 | Using where; Using join buffer (flat, BNLH join) | +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ 2 rows in set (0.00 sec)
AFAIU #hash#$hj means the hash key on the hashtable that's used for hash join. I think the name is too convoluted. The hashtable has only one key, if we need to tell it from t4's indexes it would be sufficient to use "#hashkey#", without the cryptic "$hj".
As for "#" characters: EXPLAIN already shows "internal" tables/indexes:
If you look into #maria-call log you'll see that I followed strictly Monty's instructions: to use the prefix #hash# before the index name I used to use, i.e. <used_index> should be substituted for #hash#<used_index>. The only deviation I afforded myself was: I used $hj instead of hj_key.
MariaDB [j1]> explain select * from (select * from t1) X; +----+-------------+------------+------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------+------+---------+------+------+-------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1000 | | | 2 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | | +----+-------------+------------+------+---------------+------+---------+------+------+-------+
# The following is from MWL#90 tree: MariaDB [test]> explain select * from t10 where a in (select max(b) from t10 group by a); +----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+ | 1 | PRIMARY | t10 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 5 | test.t10.a | 1 | | | 2 | SUBQUERY | t10 | ALL | NULL | NULL | NULL | NULL | 1000 | Using temporary; Using filesort | +----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+ 3 rows in set (0.01 sec)
# The following is from current 5.3: MariaDB [j1]> explain select distinct * from t1 where a in (select b from t2); +----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using temporary | | 1 | PRIMARY | subselect2 | eq_ref | unique_key | unique_key | 5 | func | 1 | | | 2 | SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 1000 | Distinct | +----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+
... and it either uses no quoting at all (see "unique_key"), or angle brackets quoting (see <subquery2>). I think we should not multiply the ways of quoting, so please consider switching to <quoting> or to not using any quoting at all.
We use # in generated names for temporary tables. I don't see any problem here.
ISSUE2
MariaDB [j1]> EXPLAIN -> SELECT City.Name, Country.Name, CountryLanguage.Language -> FROM City,Country,CountryLanguage -> WHERE City.Country=Country.Code AND -> CountryLanguage.Country=Country.Code AND -> City.Name LIKE 'L%' AND Country.Population > 3000000 AND -> CountryLanguage.Percentage > 50 AND -> LENGTH(Language) < LENGTH(City.Name) - 2; +----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+ | 1 | SIMPLE | CountryLanguage | ALL | PRIMARY,Percentage | NULL | NULL | NULL | 984 | Using where | | 1 | SIMPLE | Country | hash_ALL | PRIMARY | #hash#PRIMARY | 3 | j1.CountryLanguage.Country | 239 | Using where; Using join buffer (flat, BNLH join) | | 1 | SIMPLE | City | hash_ALL | Country | #hash#Country | 3 | j1.CountryLanguage.Country | 4079 | Using where; Using join buffer (flat, BNLH join) | +----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+ 3 rows in set (0.01 sec)
"key" column shows "#hash#PRIMARY". I suppose, this is because hash join opimizer is internally hooked into ref optimizer, and it used the PRIMARY index. This might be useful for developer, but is this information meaningful for the user? If one changes #hash#PRIMARY to #hash#IDX2 (suppose there is an index called IDX2 which has the same definition as primary key), or to #hash#$hj, will it be a different query plan? I think it should be "<hashkey>" always.
We can use (and are going to use it) statistics provided for used_index. So #hash#<associated_index> is quite relevant here.
ISSUE3 For hash_range, "key" column shows {hashkey}:{range_key}, which allows to determine that which key was used for range access. For hash_index type, "key" column shows only the hash's pseudo-index:
MariaDB [j3]> EXPLAIN SELECT t2.i FROM t1,t2 WHERE t1.cu = t2.cl; +----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+ | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 6 | | | 1 | SIMPLE | t1 | hash_index | cu1,cu2 | #hash#cu1 | 33 | func | 10 | Using where; Using join buffer (flat, BNLH join) | +----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+
What is the index that's used for scanning the table t1? "#hash#cu1" shows that hash join optimizer used access on index "cu1" in its analysis, but generally it doesn't mean that full index scan on table t1 was done on index cu1, doesn't it? I would very much prefer to see #hash#cu1:USED_INDEX there (or taking into account previous suggestions, hashkey:USED_INDEX). It will also be consistent with how hash_range and hash_index_merge use that column.
Yes, the scan index is missing here. I will add.
ISSUE4 If key and key_len use hash_part:real_part notation, i.e. use semicolon, why use underscore in hash_ALL, hash_index, etc? Won't hash:ALL, hash:index look more consistent with other columns?
again: I follow here Monty's recommendations. Nothing more, nor less. Regards, Igor.
BR Sergey

On Mon, Feb 21, 2011 at 03:20:13PM -0800, Igor Babaev wrote:
=== modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2011-02-06 04:57:03 +0000 +++ b/sql/sql_select.cc 2011-02-12 04:41:22 +0000 ... @@ -8419,6 +8438,11 @@ sort_by_tab->type= JT_ALL; sort_by_tab->read_first_record= join_init_read_record; } + else if (sort_by_tab->type == JT_HASH_NEXT) + { + sort_by_tab->type= JT_HASH; + sort_by_tab->read_first_record= join_init_read_record; + } } break; } I have a question only to this part of the patch: I don't understand how the above code could work.
The if statement above the one you've added checks if sort_by_tab was using [ordered] index scan to produce records in orderer, and if yes, switches it to a full scan (because use of join buffering will break ordering anyway).
I suppose the part you've added does something similar, but I totally fail to understand what exactly it does.
The above code says: if hash join is used do not use full index scan to look through the joined table, rather use full table scan in this case.
I've put a DBUG_ASSERT(0) right after the "sort_by_tab->type= JT_HASH;" line and ran the testsuite. The assertion didn't fire. I did this because I suspect (although can't provide a sound proof right away) that the part inside the else if () { ...} is deadcode. Do you think you could easily prove me wrong by giving a testcase? If constructing testcase is complicated, let's meet online and discuss my suspicions about the deadcode.
Now, some comments on the new EXPLAIN output:
ISSUE1.
Consider a query: MariaDB [j1]> explain select * from t3, t4 where t3.col1=t4.col2; +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ | 1 | SIMPLE | t3 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 1 | SIMPLE | t4 | hash_ALL | NULL | #hash#$hj | 5 | j1.t3.col1 | 1000 | Using where; Using join buffer (flat, BNLH join) | +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+ 2 rows in set (0.00 sec)
AFAIU #hash#$hj means the hash key on the hashtable that's used for hash join. I think the name is too convoluted. The hashtable has only one key, if we need to tell it from t4's indexes it would be sufficient to use "#hashkey#", without the cryptic "$hj".
As for "#" characters: EXPLAIN already shows "internal" tables/indexes:
If you look into #maria-call log you'll see that I followed strictly Monty's instructions: to use the prefix #hash# before the index name I used to use, i.e. <used_index> should be substituted for #hash#<used_index>.
The only deviation I afforded myself was: I used $hj instead of hj_key.
Ok. I still think that my objections are valid and current EXPLAIN output is not the best but we could discuss/address this outside the scope of this bugfix. <skip>
What is the index that's used for scanning the table t1? "#hash#cu1" shows that hash join optimizer used access on index "cu1" in its analysis, but generally it doesn't mean that full index scan on table t1 was done on index cu1, doesn't it? I would very much prefer to see #hash#cu1:USED_INDEX there (or taking into account previous suggestions, hashkey:USED_INDEX). It will also be consistent with how hash_range and hash_index_merge use that column.
Yes, the scan index is missing here. I will add.
OK.
ISSUE4 If key and key_len use hash_part:real_part notation, i.e. use semicolon, why use underscore in hash_ALL, hash_index, etc? Won't hash:ALL, hash:index look more consistent with other columns?
again: I follow here Monty's recommendations. Nothing more, nor less.
BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog
participants (2)
-
Igor Babaev
-
Sergey Petrunya