On Thu, Dec 01, 2011 at 01:12:00PM +0200, Timour Katchaounov wrote:
Ok, I agree to change to MATERIALIZED (not MATERIALIZE). Who will do the change?
In addition, in the EXTRA field there should be info which materialization is used. If I am not mistaken, there are three variants: - SJ-mat, - SJ-inside-out, - NON-SJ-mat (better terms are needed for explain).
I do not fully agree to this. - One can easily tell between semi-join and non-semi-join materialization: in semi-join case you see a separate line with table='<subquery2>' - For semi-join materialization, lookup and scan variants are easily visible. Check the next two explains: the first does eq_ref access to <subquery2> table, this is SJ-Materialization-Lookup. The second does a full scan on <subquery2>, this is SJ-Materialization-Scan. I thought this was apprent enough. Do you think it is not? MariaDB [test]> explain select * from one_k A where a in (select B.a from ten B, ten C); +----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+ | 1 | PRIMARY | A | ALL | NULL | NULL | NULL | NULL | 1000 | | | 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 5 | func | 1 | | | 2 | SUBQUERY | B | ALL | NULL | NULL | NULL | NULL | 10 | | | 2 | SUBQUERY | C | ALL | NULL | NULL | NULL | NULL | 10 | | +----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+ 4 rows in set (0.00 sec) MariaDB [test]> set debug_optimizer_prefer_join_prefix='B,C,A'; Query OK, 0 rows affected (0.00 sec) MariaDB [test]> explain select * from one_k A where a in (select B.a from ten B, ten C); +----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+ | 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | A | ALL | NULL | NULL | NULL | NULL | 1000 | Using where; Using join buffer (flat, BNL join) | | 2 | SUBQUERY | B | ALL | NULL | NULL | NULL | NULL | 10 | | | 2 | SUBQUERY | C | ALL | NULL | NULL | NULL | NULL | 10 | | +----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+ 4 rows in set (0.00 sec)
Timour
On 1.12.2011 12:54, Sergei Petrunia wrote:
Reply-To:
Hello,
I think we have already discussed this before (multiple times), and the idea was received with some conditional approval of the idea. Nothing has been done or put to paper, though, so now I'm posting this in writing.
I'm posting this now, because we're close to RC and I think it's better to do this kind of changes before the RC.
The proposal is: if a subquery is materialized (that is, the subquery is run and it's output is stored with a temporary table with unique key), EXPLAIN should not show "SUBQUERY" (like it does now), it should show "MATERIALIZE".
Consider the following two queries. The EXPLAINs are the same, although one of them runs materialization and the other does not:
MariaDB [test]> EXPLAIN select * from ten where a< any (select max(a) from ten); +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | PRIMARY | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 2 | SUBQUERY | ten | ALL | NULL | NULL | NULL | NULL | 10 | | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ 2 rows in set (0.00 sec)
MariaDB [test]> EXPLAIN select * from ten where a = any (select max(a) from ten) or a< 3; +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | PRIMARY | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 2 | SUBQUERY | ten | ALL | NULL | NULL | NULL | NULL | 10 | | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ 2 rows in set (0.00 sec)
You can only tell that the second one uses Materialization if look at EXPLAIN EXTENDED warning text. I think it is extremely bad that radically new execution strategy lurks somewhere inside EXPLAIN EXTENDED.
I remember you've used to argue that 'SUBUQERY' represents the fact that the subquqery is evaluated once, and materialization should be shown elsewhere. I object to that argument - materialization is currently shown *nowhere* (except EXPLAIN EXTENDED which is pain to read except for the most simplest queries) - Derived table subqueries, which are also executed once (and stored in a temporary table, like with materialization), have their special select_type value 'DERIVED':
MariaDB [test]> explain select * from one_k, (select max(A.a+B.a) from ten A, ten B ) foo; +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+ | 1 | PRIMARY |<derived2> | ALL | NULL | NULL | NULL | NULL | 100 | | | 1 | PRIMARY | one_k | ALL | NULL | NULL | NULL | NULL | 1000 | Using join buffer (flat, BNL join) | | 2 | DERIVED | A | ALL | NULL | NULL | NULL | NULL | 10 | | | 2 | DERIVED | B | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer (flat, BNL join) | +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+ 4 rows in set (0.02 sec)
I think the above should be sufficient to make a decision to change Materialized subqueries to show 'MATERIALIZED' (or 'MATERIALIZE') in EXPLAIN output. Any objections?
-- BR Sergei -- Sergei Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog