developers
Threads by month
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
February 2010
- 24 participants
- 291 discussions
[Maria-developers] Updated (by Guest): Subquery optimization: Efficient NOT IN execution with NULLs (68)
by worklog-noreply@askmonty.org 27 Feb '10
by worklog-noreply@askmonty.org 27 Feb '10
27 Feb '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subquery optimization: Efficient NOT IN execution with NULLs
CREATION DATE..: Fri, 27 Nov 2009, 13:22
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 68 (http://askmonty.org/worklog/?tid=68)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
No change.
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
--- /tmp/wklog.68.old.24229 2010-02-27 10:11:57.000000000 +0000
+++ /tmp/wklog.68.new.24229 2010-02-27 10:11:57.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Timour - Mon, 22 Feb 2010, 17:39)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.17116 2010-02-22 17:39:48.000000000 +0200
+++ /tmp/wklog.68.new.17116 2010-02-22 17:39:48.000000000 +0200
@@ -233,6 +233,7 @@
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
+[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
@@ -264,6 +265,10 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+[This is wrong, because if we don't fill the whole temp table, there may
+ be some tuple(s) that would match some outer tuple. In such cases, if we
+ stop filling the temp table, we would miss a TRUE result. Having a partial
+ match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
-=-=(Timour - Tue, 19 Jan 2010, 18:44)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.22569 2010-01-19 18:44:01.000000000 +0200
+++ /tmp/wklog.68.new.22569 2010-01-19 18:44:01.000000000 +0200
@@ -132,11 +132,10 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
- if (nonull_key)
- pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
+ if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
@@ -167,7 +166,7 @@
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
- else if (cur_min_key == nonull_key)
+ else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
@@ -183,8 +182,10 @@
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
- procedure determines an optimal order and a mapping
- idx_no -> idx_order (encoded as an array).
+ procedure determines an optimal order and a mapping idx_no -> idx_order
+ (encoded as an array).
+
+ This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
@@ -198,6 +199,14 @@
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
+ else if (vkey[cur_min_key] == nonull_key)
+ {
+ /*
+ If there can't be more matches for the nonull_key, we know for sure
+ there is no match, since there is no possible NULL match.
+ */
+ return FALSE
+ }
if (pq.is_empty())
{
@@ -216,7 +225,6 @@
}
-
3. Directions for improvement
========================================================================
-=-=(Timour - Tue, 19 Jan 2010, 18:29)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.21045 2010-01-19 18:29:12.000000000 +0200
+++ /tmp/wklog.68.new.21045 2010-01-19 18:29:12.000000000 +0200
@@ -132,6 +132,8 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
+ if (nonull_key)
+ pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
-=-=(Guest - Tue, 19 Jan 2010, 18:15)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.19825 2010-01-19 18:15:30.000000000 +0200
+++ /tmp/wklog.68.new.19825 2010-01-19 18:15:30.000000000 +0200
@@ -1,8 +1,16 @@
-This a copy of the initial algorithm proposed by Igor:
-======================================================
+Contents
+========================================================================
-For each left side tuple (v_1,...,v_n) we have to find the following set
-of rowids for the temp table containing N rows as the result of
+1. Initial idea as proposed by Igor
+2. Algorithm for IN execution with partial matching
+3. Directions for improvement
+
+
+1. Initial idea as proposed by Igor
+========================================================================
+
+For each left side tuple (v_1,...,v_n) we have to find the following
+set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
@@ -18,38 +26,198 @@
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
-Taken all above into account I could suggest the following algorithm to
-build R:
+Taken all above into account I could suggest the following algorithm
+to build R:
- Using indexes (read about them below) for each column participating in the
- intersection,
- merge ordered sets rowid{a_i=v_i} in the following manner.
+ Using indexes (read about them below) for each column participating
+ in the intersection, merge ordered sets rowid{a_i=v_i} in the
+ following manner.
If a rowid r has been encountered maximum in k sets
-rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
+ rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
-not in {i1,...,ik}.
+ not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
-Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i
-is null} is either
+Here we use the property (1):
+any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
-infer that for any r from R
-indexes a_i can be uniquely divided into two groups: one contains
-indexes a_i where r belongs to
-the sets rowid{a_i=v_i}, the other contains indexes a_j such that r
-belongs to rowid{a_j is null}.
-
-Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order
-needed for the merge procedure. We could use BTREE indexes for temp
-table. But they are rather expensive and
-take a lot of memory as the are implemented with RB trees.
+infer that for any r from R indexes a_i can be uniquely divided into
+two groups:
+- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
+- the other contains indexes a_j such that r belongs to
+ rowid{a_j is null}.
+
+Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
+order needed for the merge procedure. We could use BTREE indexes for
+temp table. But they are rather expensive and take a lot of memory as
+the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
+2. Algorithm for IN execution with partial matching
+========================================================================
+
+2.1 Below is shown the top-level algorithm to execute an IN predicate
+with partial matching. This algorithm is essentially the implementation
+of Item_subselect:exec().
+
+int lookup_with_null_semantics(outer_ref[], mat_subquery)
+{
+ if (index_lookup(outer_ref, mat_subquery)
+ return TRUE
+ else
+ {
+ /*
+ Check if there is a partial match (UNKNOWN) or no match (NULL).
+ */
+ if (this is the first partial match)
+ {
+ vkey[] = build array of value keys for each NULL-able column
+ of mat_subquery.
+ nkey[] = build a bitmap NULL index for each column of mat_subquery
+ that contains NULLs
+ nonull_key = build a key over all non-NULL columns of mat_subquery
+ }
+ if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
+ return UNKNOWN
+ else
+ return FALSE
+ }
+}
+
+2.2 The implementation of partial matching is as follows
+
+/*
+ Assumptions:
+ - It has already been checked if there is a complete match by a
+ regular index lookup, and the test failed.
+ - It has already been checked if there is a complete NULL row,
+ and if there was we wouldn't call this function. Thus we assume
+ that there is no complete NULL row.
+ - Not all vidx_i are empty, but some can be empty. If all were empty,
+ then the only possibility for a match is a complete NULL row, which
+ we already checked.
+
+ @param outer_ref - the uter (left) IN argument.
+ @param vidx[] - array of value keys
+ Ordered sequences of rowids of the corresponding columns a_i, such
+ that all rowids in idx_i are the ones where column a_i contains some
+ value or NULL. Each idx_i is derived dynamically, for each different
+ left argument of an IN predicate.
+ @param nidx[] - array of NULL keys
+ Bitmpas, one per each column, where a bit is set if the corresponding
+ row has a NULL value for the corresponding column.
+ @nonull_key - the only key over all columns of the materialized subquery
+ that do not contain NULLs
+
+ @returns
+ @retval FALSE if there is no match
+ @retval TRUE if there is a partial match
+*/
+
+Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
+{
+ /* Set of the keys (columns) that form a partial match. */
+ Set matching_keys = {}
+ /* A subset of all keys that need to be checked for NULL matches. */
+ Set null_keys = {}
+ Int min_key /* Key that contains the current minimum position. */
+ Int min_row /* Current row number of min_key. */
+ Int cur_min_key, cur_min_row
+ PriorityQueue pq
+
+ if (nonull_key && ! nonull_key->lookup(outer_ref))
+ return FALSE
+
+ for (i = 1; i <= n; i++)
+ {
+ vkey[i].lookup(outer_ref)
+ if (! vkey[i].is_eof())
+ pq.insert(i)
+ }
+ /*
+ Not all value keys are empty, thus we don't have only NULL
+ keys. If we had, the only possible match is a NULL row, and
+ we cheked there is no such row, therefore the result is known
+ to be FALSE.
+ In fact this algorithm makes sense for at least two non-NULL
+ columns.
+ */
+ assert(pq.elements > 1)
+
+ (min_key, min_row) = pq.pop()
+ matching_keys.add(min_key)
+ vkey[min_key].next()
+ if (! vkey[min_key].is_eof())
+ pq.insert(min_key)
+
+ while (TRUE)
+ {
+ (cur_min_key, cur_min_row) = pq.pop()
+
+ if (cur_min_row == min_row)
+ {
+ matching_keys.add(cur_min_key)
+ /* There cannot be a complete match, as we already checked for one. */
+ assert(matching_keys.elements < n)
+ }
+ else if (cur_min_key == nonull_key)
+ {
+ /*
+ The non-NULL key has no corresponding NULL index, so we know for
+ sure that the row 'min_row' is not a match.
+ */
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ else
+ {
+ assert(cur_min_row > min_row) /* Follows from the use of PQ. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ /*
+ Check if all null_keys contain a NULL at row 'min_row'. The procedure
+ internally checks all keys in a special precomputed order. A prior
+ procedure determines an optimal order and a mapping
+ idx_no -> idx_order (encoded as an array).
+ */
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ {
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ }
+
+ vkey[cur_min_key].next()
+ if (! vkey[cur_min_key].is_eof())
+ pq.insert(cur_min_key)
+
+ if (pq.is_empty())
+ {
+ /* Check the last row of the last column in PQ for NULL matches. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ return FALSE
+ }
+ }
+
+ /* We should never get here. */
+ assert(FALSE)
+ return FALSE
+}
+
+
+
+3. Directions for improvement
+========================================================================
+
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
-=-=(Timour - Sun, 06 Dec 2009, 14:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.12919 2009-12-06 14:36:18.000000000 +0200
+++ /tmp/wklog.68.new.12919 2009-12-06 14:36:18.000000000 +0200
@@ -87,3 +87,8 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+8. [timour]
+ Consider that due to materialization, we already have a unique index
+on all columns <a_1,..., a_n>. We can use the first key part of this index
+over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
+creating the index rowid{a_i=v_i}.
-=-=(Timour - Fri, 04 Dec 2009, 14:04)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.16724 2009-12-04 14:04:28.000000000 +0200
+++ /tmp/wklog.68.new.16724 2009-12-04 14:04:28.000000000 +0200
@@ -10,7 +10,8 @@
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
- (2) For each i: rowid{a_i is null} is the same for each tuple
+ (2) For each i: rowid{a_i is null} is the same for each tuple,
+ that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Version updated.
--- /tmp/wklog.68.old.5257 2009-12-04 11:27:11.000000000 +0200
+++ /tmp/wklog.68.new.5257 2009-12-04 11:27:11.000000000 +0200
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Category updated.
--- /tmp/wklog.68.old.5242 2009-12-04 11:27:02.000000000 +0200
+++ /tmp/wklog.68.new.5242 2009-12-04 11:27:02.000000000 +0200
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
------------------------------------------------------------
-=-=(View All Progress Notes, 13 total)=-=-
http://askmonty.org/worklog/index.pl?tid=68&nolimit=1
DESCRIPTION:
The goal of this task is to implement efficient execution of NOT IN
subquery predicates of the form:
<oe_1,...,oe_n> NOT IN <non_correlated subquery>
when either some oe_i, or some subqury result column contains NULLs.
The problem with such predicates is that it is possible to use index
lookups only when neither argument of the predicate contains NULLs.
If some argument contains a NULL, then due to NULL semantics, it
plays the role of a wildcard. If we were to use regular index lookups,
then we would get 'no match' for some outer tuple (thus the predicate
evaluates to FALSE), while the SQL semantics means 'partial match', and
the predicate should evaluate to NULL.
This task implements an efficient algorithm to compute such 'parial
matches', where a NULL matches any value.
HIGH-LEVEL SPECIFICATION:
Contents
========================================================================
1. Initial idea as proposed by Igor
2. Algorithm for IN execution with partial matching
3. Directions for improvement
1. Initial idea as proposed by Igor
========================================================================
For each left side tuple (v_1,...,v_n) we have to find the following
set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
trough all indexes from [1..n] such that v_i is not null.
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
(2) For each i: rowid{a_i is null} is the same for each tuple,
that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
Taken all above into account I could suggest the following algorithm
to build R:
Using indexes (read about them below) for each column participating
in the intersection, merge ordered sets rowid{a_i=v_i} in the
following manner.
If a rowid r has been encountered maximum in k sets
rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
Here we use the property (1):
any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
infer that for any r from R indexes a_i can be uniquely divided into
two groups:
- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
- the other contains indexes a_j such that r belongs to
rowid{a_j is null}.
Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
order needed for the merge procedure. We could use BTREE indexes for
temp table. But they are rather expensive and take a lot of memory as
the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
2. Algorithm for IN execution with partial matching
========================================================================
2.1 Below is shown the top-level algorithm to execute an IN predicate
with partial matching. This algorithm is essentially the implementation
of Item_subselect:exec().
int lookup_with_null_semantics(outer_ref[], mat_subquery)
{
if (index_lookup(outer_ref, mat_subquery)
return TRUE
else
{
/*
Check if there is a partial match (UNKNOWN) or no match (NULL).
*/
if (this is the first partial match)
{
vkey[] = build array of value keys for each NULL-able column
of mat_subquery.
nkey[] = build a bitmap NULL index for each column of mat_subquery
that contains NULLs
nonull_key = build a key over all non-NULL columns of mat_subquery
}
if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
return UNKNOWN
else
return FALSE
}
}
2.2 The implementation of partial matching is as follows
/*
Assumptions:
- It has already been checked if there is a complete match by a
regular index lookup, and the test failed.
- It has already been checked if there is a complete NULL row,
and if there was we wouldn't call this function. Thus we assume
that there is no complete NULL row.
- Not all vidx_i are empty, but some can be empty. If all were empty,
then the only possibility for a match is a complete NULL row, which
we already checked.
@param outer_ref - the uter (left) IN argument.
@param vidx[] - array of value keys
Ordered sequences of rowids of the corresponding columns a_i, such
that all rowids in idx_i are the ones where column a_i contains some
value or NULL. Each idx_i is derived dynamically, for each different
left argument of an IN predicate.
@param nidx[] - array of NULL keys
Bitmpas, one per each column, where a bit is set if the corresponding
row has a NULL value for the corresponding column.
@nonull_key - the only key over all columns of the materialized subquery
that do not contain NULLs
@returns
@retval FALSE if there is no match
@retval TRUE if there is a partial match
*/
Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
{
/* Set of the keys (columns) that form a partial match. */
Set matching_keys = {}
/* A subset of all keys that need to be checked for NULL matches. */
Set null_keys = {}
Int min_key /* Key that contains the current minimum position. */
Int min_row /* Current row number of min_key. */
Int cur_min_key, cur_min_row
PriorityQueue pq
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
for (i = 1; i <= n; i++)
{
if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
}
/*
Not all value keys are empty, thus we don't have only NULL
keys. If we had, the only possible match is a NULL row, and
we cheked there is no such row, therefore the result is known
to be FALSE.
In fact this algorithm makes sense for at least two non-NULL
columns.
*/
assert(pq.elements > 1)
(min_key, min_row) = pq.pop()
matching_keys.add(min_key)
vkey[min_key].next()
if (! vkey[min_key].is_eof())
pq.insert(min_key)
while (TRUE)
{
(cur_min_key, cur_min_row) = pq.pop()
if (cur_min_row == min_row)
{
matching_keys.add(cur_min_key)
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
sure that the row 'min_row' is not a match.
*/
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
else
{
assert(cur_min_row > min_row) /* Follows from the use of PQ. */
null_keys = set_difference(all keys vkey[], matching_keys)
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
procedure determines an optimal order and a mapping idx_no -> idx_order
(encoded as an array).
This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
else
{
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
}
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
else if (vkey[cur_min_key] == nonull_key)
{
/*
If there can't be more matches for the nonull_key, we know for sure
there is no match, since there is no possible NULL match.
*/
return FALSE
}
if (pq.is_empty())
{
/* Check the last row of the last column in PQ for NULL matches. */
null_keys = set_difference(all keys vkey[], matching_keys)
if (test_null_row(null_keys, min_row))
return TRUE
else
return FALSE
}
}
/* We should never get here. */
assert(FALSE)
return FALSE
}
3. Directions for improvement
========================================================================
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
values for a_i excluding nulls.
If d(a_i) is close to N'(a_i) then do not create any index array: check
whether there is a match running through the records that have been
filtered in. Anyway if d(a_i) is close to N'(a_i) then the intersection
with rowid{a_i=v_i} will not reduce the number of remaining rowids
significantly.
In other words is V(a_i) exceeds some threshold there is no sense to
create an index for a_i.
If additionally N-N'(a_i) is small do not create a bitmap for this
column either.
3. If for a column a_i d(a_i) is not close to N'(a_i), but N-N'(a_i) is
small a sorted array of rowids from the set rowid{a_i is null} can be
used instead of a bitmap.
4. We always have a match if R0= INTERSECT rowid{a_i is null} is not
empty. Here i runs through all indexes from [1..n] such that v_i is not
null. For a given subset of columns this fact has to be checked only
once. It can be easily done with bitmap intersection.
5. If v1,...,vn never can be a null, then indexes (sorted arrays) can be
created only for rows with nulls.
6. If v1,...,vn never can be a null and number of rows with nulls is
small do not create indexes and do not create bitmaps.
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
[This is wrong, because if we don't fill the whole temp table, there may
be some tuple(s) that would match some outer tuple. In such cases, if we
stop filling the temp table, we would miss a TRUE result. Having a partial
match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
on all columns <a_1,..., a_n>. We can use the first key part of this index
over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
creating the index rowid{a_i=v_i}.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Subquery optimization: Efficient NOT IN execution with NULLs (68)
by worklog-noreply@askmonty.org 27 Feb '10
by worklog-noreply@askmonty.org 27 Feb '10
27 Feb '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subquery optimization: Efficient NOT IN execution with NULLs
CREATION DATE..: Fri, 27 Nov 2009, 13:22
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 68 (http://askmonty.org/worklog/?tid=68)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
No change.
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
--- /tmp/wklog.68.old.24229 2010-02-27 10:11:57.000000000 +0000
+++ /tmp/wklog.68.new.24229 2010-02-27 10:11:57.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Timour - Mon, 22 Feb 2010, 17:39)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.17116 2010-02-22 17:39:48.000000000 +0200
+++ /tmp/wklog.68.new.17116 2010-02-22 17:39:48.000000000 +0200
@@ -233,6 +233,7 @@
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
+[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
@@ -264,6 +265,10 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+[This is wrong, because if we don't fill the whole temp table, there may
+ be some tuple(s) that would match some outer tuple. In such cases, if we
+ stop filling the temp table, we would miss a TRUE result. Having a partial
+ match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
-=-=(Timour - Tue, 19 Jan 2010, 18:44)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.22569 2010-01-19 18:44:01.000000000 +0200
+++ /tmp/wklog.68.new.22569 2010-01-19 18:44:01.000000000 +0200
@@ -132,11 +132,10 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
- if (nonull_key)
- pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
+ if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
@@ -167,7 +166,7 @@
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
- else if (cur_min_key == nonull_key)
+ else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
@@ -183,8 +182,10 @@
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
- procedure determines an optimal order and a mapping
- idx_no -> idx_order (encoded as an array).
+ procedure determines an optimal order and a mapping idx_no -> idx_order
+ (encoded as an array).
+
+ This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
@@ -198,6 +199,14 @@
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
+ else if (vkey[cur_min_key] == nonull_key)
+ {
+ /*
+ If there can't be more matches for the nonull_key, we know for sure
+ there is no match, since there is no possible NULL match.
+ */
+ return FALSE
+ }
if (pq.is_empty())
{
@@ -216,7 +225,6 @@
}
-
3. Directions for improvement
========================================================================
-=-=(Timour - Tue, 19 Jan 2010, 18:29)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.21045 2010-01-19 18:29:12.000000000 +0200
+++ /tmp/wklog.68.new.21045 2010-01-19 18:29:12.000000000 +0200
@@ -132,6 +132,8 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
+ if (nonull_key)
+ pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
-=-=(Guest - Tue, 19 Jan 2010, 18:15)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.19825 2010-01-19 18:15:30.000000000 +0200
+++ /tmp/wklog.68.new.19825 2010-01-19 18:15:30.000000000 +0200
@@ -1,8 +1,16 @@
-This a copy of the initial algorithm proposed by Igor:
-======================================================
+Contents
+========================================================================
-For each left side tuple (v_1,...,v_n) we have to find the following set
-of rowids for the temp table containing N rows as the result of
+1. Initial idea as proposed by Igor
+2. Algorithm for IN execution with partial matching
+3. Directions for improvement
+
+
+1. Initial idea as proposed by Igor
+========================================================================
+
+For each left side tuple (v_1,...,v_n) we have to find the following
+set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
@@ -18,38 +26,198 @@
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
-Taken all above into account I could suggest the following algorithm to
-build R:
+Taken all above into account I could suggest the following algorithm
+to build R:
- Using indexes (read about them below) for each column participating in the
- intersection,
- merge ordered sets rowid{a_i=v_i} in the following manner.
+ Using indexes (read about them below) for each column participating
+ in the intersection, merge ordered sets rowid{a_i=v_i} in the
+ following manner.
If a rowid r has been encountered maximum in k sets
-rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
+ rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
-not in {i1,...,ik}.
+ not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
-Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i
-is null} is either
+Here we use the property (1):
+any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
-infer that for any r from R
-indexes a_i can be uniquely divided into two groups: one contains
-indexes a_i where r belongs to
-the sets rowid{a_i=v_i}, the other contains indexes a_j such that r
-belongs to rowid{a_j is null}.
-
-Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order
-needed for the merge procedure. We could use BTREE indexes for temp
-table. But they are rather expensive and
-take a lot of memory as the are implemented with RB trees.
+infer that for any r from R indexes a_i can be uniquely divided into
+two groups:
+- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
+- the other contains indexes a_j such that r belongs to
+ rowid{a_j is null}.
+
+Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
+order needed for the merge procedure. We could use BTREE indexes for
+temp table. But they are rather expensive and take a lot of memory as
+the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
+2. Algorithm for IN execution with partial matching
+========================================================================
+
+2.1 Below is shown the top-level algorithm to execute an IN predicate
+with partial matching. This algorithm is essentially the implementation
+of Item_subselect:exec().
+
+int lookup_with_null_semantics(outer_ref[], mat_subquery)
+{
+ if (index_lookup(outer_ref, mat_subquery)
+ return TRUE
+ else
+ {
+ /*
+ Check if there is a partial match (UNKNOWN) or no match (NULL).
+ */
+ if (this is the first partial match)
+ {
+ vkey[] = build array of value keys for each NULL-able column
+ of mat_subquery.
+ nkey[] = build a bitmap NULL index for each column of mat_subquery
+ that contains NULLs
+ nonull_key = build a key over all non-NULL columns of mat_subquery
+ }
+ if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
+ return UNKNOWN
+ else
+ return FALSE
+ }
+}
+
+2.2 The implementation of partial matching is as follows
+
+/*
+ Assumptions:
+ - It has already been checked if there is a complete match by a
+ regular index lookup, and the test failed.
+ - It has already been checked if there is a complete NULL row,
+ and if there was we wouldn't call this function. Thus we assume
+ that there is no complete NULL row.
+ - Not all vidx_i are empty, but some can be empty. If all were empty,
+ then the only possibility for a match is a complete NULL row, which
+ we already checked.
+
+ @param outer_ref - the uter (left) IN argument.
+ @param vidx[] - array of value keys
+ Ordered sequences of rowids of the corresponding columns a_i, such
+ that all rowids in idx_i are the ones where column a_i contains some
+ value or NULL. Each idx_i is derived dynamically, for each different
+ left argument of an IN predicate.
+ @param nidx[] - array of NULL keys
+ Bitmpas, one per each column, where a bit is set if the corresponding
+ row has a NULL value for the corresponding column.
+ @nonull_key - the only key over all columns of the materialized subquery
+ that do not contain NULLs
+
+ @returns
+ @retval FALSE if there is no match
+ @retval TRUE if there is a partial match
+*/
+
+Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
+{
+ /* Set of the keys (columns) that form a partial match. */
+ Set matching_keys = {}
+ /* A subset of all keys that need to be checked for NULL matches. */
+ Set null_keys = {}
+ Int min_key /* Key that contains the current minimum position. */
+ Int min_row /* Current row number of min_key. */
+ Int cur_min_key, cur_min_row
+ PriorityQueue pq
+
+ if (nonull_key && ! nonull_key->lookup(outer_ref))
+ return FALSE
+
+ for (i = 1; i <= n; i++)
+ {
+ vkey[i].lookup(outer_ref)
+ if (! vkey[i].is_eof())
+ pq.insert(i)
+ }
+ /*
+ Not all value keys are empty, thus we don't have only NULL
+ keys. If we had, the only possible match is a NULL row, and
+ we cheked there is no such row, therefore the result is known
+ to be FALSE.
+ In fact this algorithm makes sense for at least two non-NULL
+ columns.
+ */
+ assert(pq.elements > 1)
+
+ (min_key, min_row) = pq.pop()
+ matching_keys.add(min_key)
+ vkey[min_key].next()
+ if (! vkey[min_key].is_eof())
+ pq.insert(min_key)
+
+ while (TRUE)
+ {
+ (cur_min_key, cur_min_row) = pq.pop()
+
+ if (cur_min_row == min_row)
+ {
+ matching_keys.add(cur_min_key)
+ /* There cannot be a complete match, as we already checked for one. */
+ assert(matching_keys.elements < n)
+ }
+ else if (cur_min_key == nonull_key)
+ {
+ /*
+ The non-NULL key has no corresponding NULL index, so we know for
+ sure that the row 'min_row' is not a match.
+ */
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ else
+ {
+ assert(cur_min_row > min_row) /* Follows from the use of PQ. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ /*
+ Check if all null_keys contain a NULL at row 'min_row'. The procedure
+ internally checks all keys in a special precomputed order. A prior
+ procedure determines an optimal order and a mapping
+ idx_no -> idx_order (encoded as an array).
+ */
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ {
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ }
+
+ vkey[cur_min_key].next()
+ if (! vkey[cur_min_key].is_eof())
+ pq.insert(cur_min_key)
+
+ if (pq.is_empty())
+ {
+ /* Check the last row of the last column in PQ for NULL matches. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ return FALSE
+ }
+ }
+
+ /* We should never get here. */
+ assert(FALSE)
+ return FALSE
+}
+
+
+
+3. Directions for improvement
+========================================================================
+
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
-=-=(Timour - Sun, 06 Dec 2009, 14:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.12919 2009-12-06 14:36:18.000000000 +0200
+++ /tmp/wklog.68.new.12919 2009-12-06 14:36:18.000000000 +0200
@@ -87,3 +87,8 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+8. [timour]
+ Consider that due to materialization, we already have a unique index
+on all columns <a_1,..., a_n>. We can use the first key part of this index
+over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
+creating the index rowid{a_i=v_i}.
-=-=(Timour - Fri, 04 Dec 2009, 14:04)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.16724 2009-12-04 14:04:28.000000000 +0200
+++ /tmp/wklog.68.new.16724 2009-12-04 14:04:28.000000000 +0200
@@ -10,7 +10,8 @@
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
- (2) For each i: rowid{a_i is null} is the same for each tuple
+ (2) For each i: rowid{a_i is null} is the same for each tuple,
+ that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Version updated.
--- /tmp/wklog.68.old.5257 2009-12-04 11:27:11.000000000 +0200
+++ /tmp/wklog.68.new.5257 2009-12-04 11:27:11.000000000 +0200
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Category updated.
--- /tmp/wklog.68.old.5242 2009-12-04 11:27:02.000000000 +0200
+++ /tmp/wklog.68.new.5242 2009-12-04 11:27:02.000000000 +0200
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
------------------------------------------------------------
-=-=(View All Progress Notes, 13 total)=-=-
http://askmonty.org/worklog/index.pl?tid=68&nolimit=1
DESCRIPTION:
The goal of this task is to implement efficient execution of NOT IN
subquery predicates of the form:
<oe_1,...,oe_n> NOT IN <non_correlated subquery>
when either some oe_i, or some subqury result column contains NULLs.
The problem with such predicates is that it is possible to use index
lookups only when neither argument of the predicate contains NULLs.
If some argument contains a NULL, then due to NULL semantics, it
plays the role of a wildcard. If we were to use regular index lookups,
then we would get 'no match' for some outer tuple (thus the predicate
evaluates to FALSE), while the SQL semantics means 'partial match', and
the predicate should evaluate to NULL.
This task implements an efficient algorithm to compute such 'parial
matches', where a NULL matches any value.
HIGH-LEVEL SPECIFICATION:
Contents
========================================================================
1. Initial idea as proposed by Igor
2. Algorithm for IN execution with partial matching
3. Directions for improvement
1. Initial idea as proposed by Igor
========================================================================
For each left side tuple (v_1,...,v_n) we have to find the following
set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
trough all indexes from [1..n] such that v_i is not null.
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
(2) For each i: rowid{a_i is null} is the same for each tuple,
that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
Taken all above into account I could suggest the following algorithm
to build R:
Using indexes (read about them below) for each column participating
in the intersection, merge ordered sets rowid{a_i=v_i} in the
following manner.
If a rowid r has been encountered maximum in k sets
rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
Here we use the property (1):
any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
infer that for any r from R indexes a_i can be uniquely divided into
two groups:
- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
- the other contains indexes a_j such that r belongs to
rowid{a_j is null}.
Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
order needed for the merge procedure. We could use BTREE indexes for
temp table. But they are rather expensive and take a lot of memory as
the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
2. Algorithm for IN execution with partial matching
========================================================================
2.1 Below is shown the top-level algorithm to execute an IN predicate
with partial matching. This algorithm is essentially the implementation
of Item_subselect:exec().
int lookup_with_null_semantics(outer_ref[], mat_subquery)
{
if (index_lookup(outer_ref, mat_subquery)
return TRUE
else
{
/*
Check if there is a partial match (UNKNOWN) or no match (NULL).
*/
if (this is the first partial match)
{
vkey[] = build array of value keys for each NULL-able column
of mat_subquery.
nkey[] = build a bitmap NULL index for each column of mat_subquery
that contains NULLs
nonull_key = build a key over all non-NULL columns of mat_subquery
}
if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
return UNKNOWN
else
return FALSE
}
}
2.2 The implementation of partial matching is as follows
/*
Assumptions:
- It has already been checked if there is a complete match by a
regular index lookup, and the test failed.
- It has already been checked if there is a complete NULL row,
and if there was we wouldn't call this function. Thus we assume
that there is no complete NULL row.
- Not all vidx_i are empty, but some can be empty. If all were empty,
then the only possibility for a match is a complete NULL row, which
we already checked.
@param outer_ref - the uter (left) IN argument.
@param vidx[] - array of value keys
Ordered sequences of rowids of the corresponding columns a_i, such
that all rowids in idx_i are the ones where column a_i contains some
value or NULL. Each idx_i is derived dynamically, for each different
left argument of an IN predicate.
@param nidx[] - array of NULL keys
Bitmpas, one per each column, where a bit is set if the corresponding
row has a NULL value for the corresponding column.
@nonull_key - the only key over all columns of the materialized subquery
that do not contain NULLs
@returns
@retval FALSE if there is no match
@retval TRUE if there is a partial match
*/
Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
{
/* Set of the keys (columns) that form a partial match. */
Set matching_keys = {}
/* A subset of all keys that need to be checked for NULL matches. */
Set null_keys = {}
Int min_key /* Key that contains the current minimum position. */
Int min_row /* Current row number of min_key. */
Int cur_min_key, cur_min_row
PriorityQueue pq
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
for (i = 1; i <= n; i++)
{
if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
}
/*
Not all value keys are empty, thus we don't have only NULL
keys. If we had, the only possible match is a NULL row, and
we cheked there is no such row, therefore the result is known
to be FALSE.
In fact this algorithm makes sense for at least two non-NULL
columns.
*/
assert(pq.elements > 1)
(min_key, min_row) = pq.pop()
matching_keys.add(min_key)
vkey[min_key].next()
if (! vkey[min_key].is_eof())
pq.insert(min_key)
while (TRUE)
{
(cur_min_key, cur_min_row) = pq.pop()
if (cur_min_row == min_row)
{
matching_keys.add(cur_min_key)
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
sure that the row 'min_row' is not a match.
*/
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
else
{
assert(cur_min_row > min_row) /* Follows from the use of PQ. */
null_keys = set_difference(all keys vkey[], matching_keys)
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
procedure determines an optimal order and a mapping idx_no -> idx_order
(encoded as an array).
This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
else
{
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
}
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
else if (vkey[cur_min_key] == nonull_key)
{
/*
If there can't be more matches for the nonull_key, we know for sure
there is no match, since there is no possible NULL match.
*/
return FALSE
}
if (pq.is_empty())
{
/* Check the last row of the last column in PQ for NULL matches. */
null_keys = set_difference(all keys vkey[], matching_keys)
if (test_null_row(null_keys, min_row))
return TRUE
else
return FALSE
}
}
/* We should never get here. */
assert(FALSE)
return FALSE
}
3. Directions for improvement
========================================================================
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
values for a_i excluding nulls.
If d(a_i) is close to N'(a_i) then do not create any index array: check
whether there is a match running through the records that have been
filtered in. Anyway if d(a_i) is close to N'(a_i) then the intersection
with rowid{a_i=v_i} will not reduce the number of remaining rowids
significantly.
In other words is V(a_i) exceeds some threshold there is no sense to
create an index for a_i.
If additionally N-N'(a_i) is small do not create a bitmap for this
column either.
3. If for a column a_i d(a_i) is not close to N'(a_i), but N-N'(a_i) is
small a sorted array of rowids from the set rowid{a_i is null} can be
used instead of a bitmap.
4. We always have a match if R0= INTERSECT rowid{a_i is null} is not
empty. Here i runs through all indexes from [1..n] such that v_i is not
null. For a given subset of columns this fact has to be checked only
once. It can be easily done with bitmap intersection.
5. If v1,...,vn never can be a null, then indexes (sorted arrays) can be
created only for rows with nulls.
6. If v1,...,vn never can be a null and number of rows with nulls is
small do not create indexes and do not create bitmaps.
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
[This is wrong, because if we don't fill the whole temp table, there may
be some tuple(s) that would match some outer tuple. In such cases, if we
stop filling the temp table, we would miss a TRUE result. Having a partial
match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
on all columns <a_1,..., a_n>. We can use the first key part of this index
over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
creating the index rowid{a_i=v_i}.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Subquery optimization: Efficient NOT IN execution with NULLs (68)
by worklog-noreply@askmonty.org 27 Feb '10
by worklog-noreply@askmonty.org 27 Feb '10
27 Feb '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subquery optimization: Efficient NOT IN execution with NULLs
CREATION DATE..: Fri, 27 Nov 2009, 13:22
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 68 (http://askmonty.org/worklog/?tid=68)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
--- /tmp/wklog.68.old.24229 2010-02-27 10:11:57.000000000 +0000
+++ /tmp/wklog.68.new.24229 2010-02-27 10:11:57.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Timour - Mon, 22 Feb 2010, 17:39)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.17116 2010-02-22 17:39:48.000000000 +0200
+++ /tmp/wklog.68.new.17116 2010-02-22 17:39:48.000000000 +0200
@@ -233,6 +233,7 @@
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
+[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
@@ -264,6 +265,10 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+[This is wrong, because if we don't fill the whole temp table, there may
+ be some tuple(s) that would match some outer tuple. In such cases, if we
+ stop filling the temp table, we would miss a TRUE result. Having a partial
+ match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
-=-=(Timour - Tue, 19 Jan 2010, 18:44)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.22569 2010-01-19 18:44:01.000000000 +0200
+++ /tmp/wklog.68.new.22569 2010-01-19 18:44:01.000000000 +0200
@@ -132,11 +132,10 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
- if (nonull_key)
- pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
+ if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
@@ -167,7 +166,7 @@
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
- else if (cur_min_key == nonull_key)
+ else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
@@ -183,8 +182,10 @@
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
- procedure determines an optimal order and a mapping
- idx_no -> idx_order (encoded as an array).
+ procedure determines an optimal order and a mapping idx_no -> idx_order
+ (encoded as an array).
+
+ This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
@@ -198,6 +199,14 @@
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
+ else if (vkey[cur_min_key] == nonull_key)
+ {
+ /*
+ If there can't be more matches for the nonull_key, we know for sure
+ there is no match, since there is no possible NULL match.
+ */
+ return FALSE
+ }
if (pq.is_empty())
{
@@ -216,7 +225,6 @@
}
-
3. Directions for improvement
========================================================================
-=-=(Timour - Tue, 19 Jan 2010, 18:29)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.21045 2010-01-19 18:29:12.000000000 +0200
+++ /tmp/wklog.68.new.21045 2010-01-19 18:29:12.000000000 +0200
@@ -132,6 +132,8 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
+ if (nonull_key)
+ pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
-=-=(Guest - Tue, 19 Jan 2010, 18:15)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.19825 2010-01-19 18:15:30.000000000 +0200
+++ /tmp/wklog.68.new.19825 2010-01-19 18:15:30.000000000 +0200
@@ -1,8 +1,16 @@
-This a copy of the initial algorithm proposed by Igor:
-======================================================
+Contents
+========================================================================
-For each left side tuple (v_1,...,v_n) we have to find the following set
-of rowids for the temp table containing N rows as the result of
+1. Initial idea as proposed by Igor
+2. Algorithm for IN execution with partial matching
+3. Directions for improvement
+
+
+1. Initial idea as proposed by Igor
+========================================================================
+
+For each left side tuple (v_1,...,v_n) we have to find the following
+set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
@@ -18,38 +26,198 @@
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
-Taken all above into account I could suggest the following algorithm to
-build R:
+Taken all above into account I could suggest the following algorithm
+to build R:
- Using indexes (read about them below) for each column participating in the
- intersection,
- merge ordered sets rowid{a_i=v_i} in the following manner.
+ Using indexes (read about them below) for each column participating
+ in the intersection, merge ordered sets rowid{a_i=v_i} in the
+ following manner.
If a rowid r has been encountered maximum in k sets
-rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
+ rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
-not in {i1,...,ik}.
+ not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
-Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i
-is null} is either
+Here we use the property (1):
+any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
-infer that for any r from R
-indexes a_i can be uniquely divided into two groups: one contains
-indexes a_i where r belongs to
-the sets rowid{a_i=v_i}, the other contains indexes a_j such that r
-belongs to rowid{a_j is null}.
-
-Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order
-needed for the merge procedure. We could use BTREE indexes for temp
-table. But they are rather expensive and
-take a lot of memory as the are implemented with RB trees.
+infer that for any r from R indexes a_i can be uniquely divided into
+two groups:
+- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
+- the other contains indexes a_j such that r belongs to
+ rowid{a_j is null}.
+
+Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
+order needed for the merge procedure. We could use BTREE indexes for
+temp table. But they are rather expensive and take a lot of memory as
+the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
+2. Algorithm for IN execution with partial matching
+========================================================================
+
+2.1 Below is shown the top-level algorithm to execute an IN predicate
+with partial matching. This algorithm is essentially the implementation
+of Item_subselect:exec().
+
+int lookup_with_null_semantics(outer_ref[], mat_subquery)
+{
+ if (index_lookup(outer_ref, mat_subquery)
+ return TRUE
+ else
+ {
+ /*
+ Check if there is a partial match (UNKNOWN) or no match (NULL).
+ */
+ if (this is the first partial match)
+ {
+ vkey[] = build array of value keys for each NULL-able column
+ of mat_subquery.
+ nkey[] = build a bitmap NULL index for each column of mat_subquery
+ that contains NULLs
+ nonull_key = build a key over all non-NULL columns of mat_subquery
+ }
+ if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
+ return UNKNOWN
+ else
+ return FALSE
+ }
+}
+
+2.2 The implementation of partial matching is as follows
+
+/*
+ Assumptions:
+ - It has already been checked if there is a complete match by a
+ regular index lookup, and the test failed.
+ - It has already been checked if there is a complete NULL row,
+ and if there was we wouldn't call this function. Thus we assume
+ that there is no complete NULL row.
+ - Not all vidx_i are empty, but some can be empty. If all were empty,
+ then the only possibility for a match is a complete NULL row, which
+ we already checked.
+
+ @param outer_ref - the uter (left) IN argument.
+ @param vidx[] - array of value keys
+ Ordered sequences of rowids of the corresponding columns a_i, such
+ that all rowids in idx_i are the ones where column a_i contains some
+ value or NULL. Each idx_i is derived dynamically, for each different
+ left argument of an IN predicate.
+ @param nidx[] - array of NULL keys
+ Bitmpas, one per each column, where a bit is set if the corresponding
+ row has a NULL value for the corresponding column.
+ @nonull_key - the only key over all columns of the materialized subquery
+ that do not contain NULLs
+
+ @returns
+ @retval FALSE if there is no match
+ @retval TRUE if there is a partial match
+*/
+
+Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
+{
+ /* Set of the keys (columns) that form a partial match. */
+ Set matching_keys = {}
+ /* A subset of all keys that need to be checked for NULL matches. */
+ Set null_keys = {}
+ Int min_key /* Key that contains the current minimum position. */
+ Int min_row /* Current row number of min_key. */
+ Int cur_min_key, cur_min_row
+ PriorityQueue pq
+
+ if (nonull_key && ! nonull_key->lookup(outer_ref))
+ return FALSE
+
+ for (i = 1; i <= n; i++)
+ {
+ vkey[i].lookup(outer_ref)
+ if (! vkey[i].is_eof())
+ pq.insert(i)
+ }
+ /*
+ Not all value keys are empty, thus we don't have only NULL
+ keys. If we had, the only possible match is a NULL row, and
+ we cheked there is no such row, therefore the result is known
+ to be FALSE.
+ In fact this algorithm makes sense for at least two non-NULL
+ columns.
+ */
+ assert(pq.elements > 1)
+
+ (min_key, min_row) = pq.pop()
+ matching_keys.add(min_key)
+ vkey[min_key].next()
+ if (! vkey[min_key].is_eof())
+ pq.insert(min_key)
+
+ while (TRUE)
+ {
+ (cur_min_key, cur_min_row) = pq.pop()
+
+ if (cur_min_row == min_row)
+ {
+ matching_keys.add(cur_min_key)
+ /* There cannot be a complete match, as we already checked for one. */
+ assert(matching_keys.elements < n)
+ }
+ else if (cur_min_key == nonull_key)
+ {
+ /*
+ The non-NULL key has no corresponding NULL index, so we know for
+ sure that the row 'min_row' is not a match.
+ */
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ else
+ {
+ assert(cur_min_row > min_row) /* Follows from the use of PQ. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ /*
+ Check if all null_keys contain a NULL at row 'min_row'. The procedure
+ internally checks all keys in a special precomputed order. A prior
+ procedure determines an optimal order and a mapping
+ idx_no -> idx_order (encoded as an array).
+ */
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ {
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ }
+
+ vkey[cur_min_key].next()
+ if (! vkey[cur_min_key].is_eof())
+ pq.insert(cur_min_key)
+
+ if (pq.is_empty())
+ {
+ /* Check the last row of the last column in PQ for NULL matches. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ return FALSE
+ }
+ }
+
+ /* We should never get here. */
+ assert(FALSE)
+ return FALSE
+}
+
+
+
+3. Directions for improvement
+========================================================================
+
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
-=-=(Timour - Sun, 06 Dec 2009, 14:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.12919 2009-12-06 14:36:18.000000000 +0200
+++ /tmp/wklog.68.new.12919 2009-12-06 14:36:18.000000000 +0200
@@ -87,3 +87,8 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+8. [timour]
+ Consider that due to materialization, we already have a unique index
+on all columns <a_1,..., a_n>. We can use the first key part of this index
+over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
+creating the index rowid{a_i=v_i}.
-=-=(Timour - Fri, 04 Dec 2009, 14:04)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.16724 2009-12-04 14:04:28.000000000 +0200
+++ /tmp/wklog.68.new.16724 2009-12-04 14:04:28.000000000 +0200
@@ -10,7 +10,8 @@
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
- (2) For each i: rowid{a_i is null} is the same for each tuple
+ (2) For each i: rowid{a_i is null} is the same for each tuple,
+ that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Version updated.
--- /tmp/wklog.68.old.5257 2009-12-04 11:27:11.000000000 +0200
+++ /tmp/wklog.68.new.5257 2009-12-04 11:27:11.000000000 +0200
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Category updated.
--- /tmp/wklog.68.old.5242 2009-12-04 11:27:02.000000000 +0200
+++ /tmp/wklog.68.new.5242 2009-12-04 11:27:02.000000000 +0200
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Status updated.
--- /tmp/wklog.68.old.5242 2009-12-04 11:27:02.000000000 +0200
+++ /tmp/wklog.68.new.5242 2009-12-04 11:27:02.000000000 +0200
@@ -1 +1 @@
-Un-Assigned
+Assigned
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=68&nolimit=1
DESCRIPTION:
The goal of this task is to implement efficient execution of NOT IN
subquery predicates of the form:
<oe_1,...,oe_n> NOT IN <non_correlated subquery>
when either some oe_i, or some subqury result column contains NULLs.
The problem with such predicates is that it is possible to use index
lookups only when neither argument of the predicate contains NULLs.
If some argument contains a NULL, then due to NULL semantics, it
plays the role of a wildcard. If we were to use regular index lookups,
then we would get 'no match' for some outer tuple (thus the predicate
evaluates to FALSE), while the SQL semantics means 'partial match', and
the predicate should evaluate to NULL.
This task implements an efficient algorithm to compute such 'parial
matches', where a NULL matches any value.
HIGH-LEVEL SPECIFICATION:
Contents
========================================================================
1. Initial idea as proposed by Igor
2. Algorithm for IN execution with partial matching
3. Directions for improvement
1. Initial idea as proposed by Igor
========================================================================
For each left side tuple (v_1,...,v_n) we have to find the following
set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
trough all indexes from [1..n] such that v_i is not null.
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
(2) For each i: rowid{a_i is null} is the same for each tuple,
that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
Taken all above into account I could suggest the following algorithm
to build R:
Using indexes (read about them below) for each column participating
in the intersection, merge ordered sets rowid{a_i=v_i} in the
following manner.
If a rowid r has been encountered maximum in k sets
rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
Here we use the property (1):
any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
infer that for any r from R indexes a_i can be uniquely divided into
two groups:
- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
- the other contains indexes a_j such that r belongs to
rowid{a_j is null}.
Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
order needed for the merge procedure. We could use BTREE indexes for
temp table. But they are rather expensive and take a lot of memory as
the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
2. Algorithm for IN execution with partial matching
========================================================================
2.1 Below is shown the top-level algorithm to execute an IN predicate
with partial matching. This algorithm is essentially the implementation
of Item_subselect:exec().
int lookup_with_null_semantics(outer_ref[], mat_subquery)
{
if (index_lookup(outer_ref, mat_subquery)
return TRUE
else
{
/*
Check if there is a partial match (UNKNOWN) or no match (NULL).
*/
if (this is the first partial match)
{
vkey[] = build array of value keys for each NULL-able column
of mat_subquery.
nkey[] = build a bitmap NULL index for each column of mat_subquery
that contains NULLs
nonull_key = build a key over all non-NULL columns of mat_subquery
}
if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
return UNKNOWN
else
return FALSE
}
}
2.2 The implementation of partial matching is as follows
/*
Assumptions:
- It has already been checked if there is a complete match by a
regular index lookup, and the test failed.
- It has already been checked if there is a complete NULL row,
and if there was we wouldn't call this function. Thus we assume
that there is no complete NULL row.
- Not all vidx_i are empty, but some can be empty. If all were empty,
then the only possibility for a match is a complete NULL row, which
we already checked.
@param outer_ref - the uter (left) IN argument.
@param vidx[] - array of value keys
Ordered sequences of rowids of the corresponding columns a_i, such
that all rowids in idx_i are the ones where column a_i contains some
value or NULL. Each idx_i is derived dynamically, for each different
left argument of an IN predicate.
@param nidx[] - array of NULL keys
Bitmpas, one per each column, where a bit is set if the corresponding
row has a NULL value for the corresponding column.
@nonull_key - the only key over all columns of the materialized subquery
that do not contain NULLs
@returns
@retval FALSE if there is no match
@retval TRUE if there is a partial match
*/
Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
{
/* Set of the keys (columns) that form a partial match. */
Set matching_keys = {}
/* A subset of all keys that need to be checked for NULL matches. */
Set null_keys = {}
Int min_key /* Key that contains the current minimum position. */
Int min_row /* Current row number of min_key. */
Int cur_min_key, cur_min_row
PriorityQueue pq
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
for (i = 1; i <= n; i++)
{
if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
}
/*
Not all value keys are empty, thus we don't have only NULL
keys. If we had, the only possible match is a NULL row, and
we cheked there is no such row, therefore the result is known
to be FALSE.
In fact this algorithm makes sense for at least two non-NULL
columns.
*/
assert(pq.elements > 1)
(min_key, min_row) = pq.pop()
matching_keys.add(min_key)
vkey[min_key].next()
if (! vkey[min_key].is_eof())
pq.insert(min_key)
while (TRUE)
{
(cur_min_key, cur_min_row) = pq.pop()
if (cur_min_row == min_row)
{
matching_keys.add(cur_min_key)
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
sure that the row 'min_row' is not a match.
*/
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
else
{
assert(cur_min_row > min_row) /* Follows from the use of PQ. */
null_keys = set_difference(all keys vkey[], matching_keys)
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
procedure determines an optimal order and a mapping idx_no -> idx_order
(encoded as an array).
This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
else
{
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
}
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
else if (vkey[cur_min_key] == nonull_key)
{
/*
If there can't be more matches for the nonull_key, we know for sure
there is no match, since there is no possible NULL match.
*/
return FALSE
}
if (pq.is_empty())
{
/* Check the last row of the last column in PQ for NULL matches. */
null_keys = set_difference(all keys vkey[], matching_keys)
if (test_null_row(null_keys, min_row))
return TRUE
else
return FALSE
}
}
/* We should never get here. */
assert(FALSE)
return FALSE
}
3. Directions for improvement
========================================================================
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
values for a_i excluding nulls.
If d(a_i) is close to N'(a_i) then do not create any index array: check
whether there is a match running through the records that have been
filtered in. Anyway if d(a_i) is close to N'(a_i) then the intersection
with rowid{a_i=v_i} will not reduce the number of remaining rowids
significantly.
In other words is V(a_i) exceeds some threshold there is no sense to
create an index for a_i.
If additionally N-N'(a_i) is small do not create a bitmap for this
column either.
3. If for a column a_i d(a_i) is not close to N'(a_i), but N-N'(a_i) is
small a sorted array of rowids from the set rowid{a_i is null} can be
used instead of a bitmap.
4. We always have a match if R0= INTERSECT rowid{a_i is null} is not
empty. Here i runs through all indexes from [1..n] such that v_i is not
null. For a given subset of columns this fact has to be checked only
once. It can be easily done with bitmap intersection.
5. If v1,...,vn never can be a null, then indexes (sorted arrays) can be
created only for rows with nulls.
6. If v1,...,vn never can be a null and number of rows with nulls is
small do not create indexes and do not create bitmaps.
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
[This is wrong, because if we don't fill the whole temp table, there may
be some tuple(s) that would match some outer tuple. In such cases, if we
stop filling the temp table, we would miss a TRUE result. Having a partial
match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
on all columns <a_1,..., a_n>. We can use the first key part of this index
over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
creating the index rowid{a_i=v_i}.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
[Maria-developers] Updated (by Guest): Subquery optimization: Efficient NOT IN execution with NULLs (68)
by worklog-noreply@askmonty.org 27 Feb '10
by worklog-noreply@askmonty.org 27 Feb '10
27 Feb '10
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subquery optimization: Efficient NOT IN execution with NULLs
CREATION DATE..: Fri, 27 Nov 2009, 13:22
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......:
CATEGORY.......: Server-Sprint
TASK ID........: 68 (http://askmonty.org/worklog/?tid=68)
VERSION........: Server-9.x
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Guest - Sat, 27 Feb 2010, 10:11)=-=-
Status updated.
--- /tmp/wklog.68.old.24229 2010-02-27 10:11:57.000000000 +0000
+++ /tmp/wklog.68.new.24229 2010-02-27 10:11:57.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress
-=-=(Timour - Mon, 22 Feb 2010, 17:39)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.17116 2010-02-22 17:39:48.000000000 +0200
+++ /tmp/wklog.68.new.17116 2010-02-22 17:39:48.000000000 +0200
@@ -233,6 +233,7 @@
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
+[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
@@ -264,6 +265,10 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+[This is wrong, because if we don't fill the whole temp table, there may
+ be some tuple(s) that would match some outer tuple. In such cases, if we
+ stop filling the temp table, we would miss a TRUE result. Having a partial
+ match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
-=-=(Timour - Tue, 19 Jan 2010, 18:44)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.22569 2010-01-19 18:44:01.000000000 +0200
+++ /tmp/wklog.68.new.22569 2010-01-19 18:44:01.000000000 +0200
@@ -132,11 +132,10 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
- if (nonull_key)
- pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
+ if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
@@ -167,7 +166,7 @@
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
- else if (cur_min_key == nonull_key)
+ else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
@@ -183,8 +182,10 @@
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
- procedure determines an optimal order and a mapping
- idx_no -> idx_order (encoded as an array).
+ procedure determines an optimal order and a mapping idx_no -> idx_order
+ (encoded as an array).
+
+ This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
@@ -198,6 +199,14 @@
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
+ else if (vkey[cur_min_key] == nonull_key)
+ {
+ /*
+ If there can't be more matches for the nonull_key, we know for sure
+ there is no match, since there is no possible NULL match.
+ */
+ return FALSE
+ }
if (pq.is_empty())
{
@@ -216,7 +225,6 @@
}
-
3. Directions for improvement
========================================================================
-=-=(Timour - Tue, 19 Jan 2010, 18:29)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.21045 2010-01-19 18:29:12.000000000 +0200
+++ /tmp/wklog.68.new.21045 2010-01-19 18:29:12.000000000 +0200
@@ -132,6 +132,8 @@
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
+ if (nonull_key)
+ pq.insert(nonull_key)
for (i = 1; i <= n; i++)
{
-=-=(Guest - Tue, 19 Jan 2010, 18:15)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.19825 2010-01-19 18:15:30.000000000 +0200
+++ /tmp/wklog.68.new.19825 2010-01-19 18:15:30.000000000 +0200
@@ -1,8 +1,16 @@
-This a copy of the initial algorithm proposed by Igor:
-======================================================
+Contents
+========================================================================
-For each left side tuple (v_1,...,v_n) we have to find the following set
-of rowids for the temp table containing N rows as the result of
+1. Initial idea as proposed by Igor
+2. Algorithm for IN execution with partial matching
+3. Directions for improvement
+
+
+1. Initial idea as proposed by Igor
+========================================================================
+
+For each left side tuple (v_1,...,v_n) we have to find the following
+set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
@@ -18,38 +26,198 @@
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
-Taken all above into account I could suggest the following algorithm to
-build R:
+Taken all above into account I could suggest the following algorithm
+to build R:
- Using indexes (read about them below) for each column participating in the
- intersection,
- merge ordered sets rowid{a_i=v_i} in the following manner.
+ Using indexes (read about them below) for each column participating
+ in the intersection, merge ordered sets rowid{a_i=v_i} in the
+ following manner.
If a rowid r has been encountered maximum in k sets
-rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
+ rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
-not in {i1,...,ik}.
+ not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
-Here we use the property (1): any r from rowid{a_i=v_i} UNION rowid{a_i
-is null} is either
+Here we use the property (1):
+any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
-infer that for any r from R
-indexes a_i can be uniquely divided into two groups: one contains
-indexes a_i where r belongs to
-the sets rowid{a_i=v_i}, the other contains indexes a_j such that r
-belongs to rowid{a_j is null}.
-
-Now let's talk how to get elements from rowid{a_i=v_i} in a sorted order
-needed for the merge procedure. We could use BTREE indexes for temp
-table. But they are rather expensive and
-take a lot of memory as the are implemented with RB trees.
+infer that for any r from R indexes a_i can be uniquely divided into
+two groups:
+- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
+- the other contains indexes a_j such that r belongs to
+ rowid{a_j is null}.
+
+Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
+order needed for the merge procedure. We could use BTREE indexes for
+temp table. But they are rather expensive and take a lot of memory as
+the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
+2. Algorithm for IN execution with partial matching
+========================================================================
+
+2.1 Below is shown the top-level algorithm to execute an IN predicate
+with partial matching. This algorithm is essentially the implementation
+of Item_subselect:exec().
+
+int lookup_with_null_semantics(outer_ref[], mat_subquery)
+{
+ if (index_lookup(outer_ref, mat_subquery)
+ return TRUE
+ else
+ {
+ /*
+ Check if there is a partial match (UNKNOWN) or no match (NULL).
+ */
+ if (this is the first partial match)
+ {
+ vkey[] = build array of value keys for each NULL-able column
+ of mat_subquery.
+ nkey[] = build a bitmap NULL index for each column of mat_subquery
+ that contains NULLs
+ nonull_key = build a key over all non-NULL columns of mat_subquery
+ }
+ if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
+ return UNKNOWN
+ else
+ return FALSE
+ }
+}
+
+2.2 The implementation of partial matching is as follows
+
+/*
+ Assumptions:
+ - It has already been checked if there is a complete match by a
+ regular index lookup, and the test failed.
+ - It has already been checked if there is a complete NULL row,
+ and if there was we wouldn't call this function. Thus we assume
+ that there is no complete NULL row.
+ - Not all vidx_i are empty, but some can be empty. If all were empty,
+ then the only possibility for a match is a complete NULL row, which
+ we already checked.
+
+ @param outer_ref - the uter (left) IN argument.
+ @param vidx[] - array of value keys
+ Ordered sequences of rowids of the corresponding columns a_i, such
+ that all rowids in idx_i are the ones where column a_i contains some
+ value or NULL. Each idx_i is derived dynamically, for each different
+ left argument of an IN predicate.
+ @param nidx[] - array of NULL keys
+ Bitmpas, one per each column, where a bit is set if the corresponding
+ row has a NULL value for the corresponding column.
+ @nonull_key - the only key over all columns of the materialized subquery
+ that do not contain NULLs
+
+ @returns
+ @retval FALSE if there is no match
+ @retval TRUE if there is a partial match
+*/
+
+Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
+{
+ /* Set of the keys (columns) that form a partial match. */
+ Set matching_keys = {}
+ /* A subset of all keys that need to be checked for NULL matches. */
+ Set null_keys = {}
+ Int min_key /* Key that contains the current minimum position. */
+ Int min_row /* Current row number of min_key. */
+ Int cur_min_key, cur_min_row
+ PriorityQueue pq
+
+ if (nonull_key && ! nonull_key->lookup(outer_ref))
+ return FALSE
+
+ for (i = 1; i <= n; i++)
+ {
+ vkey[i].lookup(outer_ref)
+ if (! vkey[i].is_eof())
+ pq.insert(i)
+ }
+ /*
+ Not all value keys are empty, thus we don't have only NULL
+ keys. If we had, the only possible match is a NULL row, and
+ we cheked there is no such row, therefore the result is known
+ to be FALSE.
+ In fact this algorithm makes sense for at least two non-NULL
+ columns.
+ */
+ assert(pq.elements > 1)
+
+ (min_key, min_row) = pq.pop()
+ matching_keys.add(min_key)
+ vkey[min_key].next()
+ if (! vkey[min_key].is_eof())
+ pq.insert(min_key)
+
+ while (TRUE)
+ {
+ (cur_min_key, cur_min_row) = pq.pop()
+
+ if (cur_min_row == min_row)
+ {
+ matching_keys.add(cur_min_key)
+ /* There cannot be a complete match, as we already checked for one. */
+ assert(matching_keys.elements < n)
+ }
+ else if (cur_min_key == nonull_key)
+ {
+ /*
+ The non-NULL key has no corresponding NULL index, so we know for
+ sure that the row 'min_row' is not a match.
+ */
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ else
+ {
+ assert(cur_min_row > min_row) /* Follows from the use of PQ. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ /*
+ Check if all null_keys contain a NULL at row 'min_row'. The procedure
+ internally checks all keys in a special precomputed order. A prior
+ procedure determines an optimal order and a mapping
+ idx_no -> idx_order (encoded as an array).
+ */
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ {
+ (min_key, min_row) = (cur_min_key, cur_min_row)
+ matching_keys = {min_key}
+ }
+ }
+
+ vkey[cur_min_key].next()
+ if (! vkey[cur_min_key].is_eof())
+ pq.insert(cur_min_key)
+
+ if (pq.is_empty())
+ {
+ /* Check the last row of the last column in PQ for NULL matches. */
+ null_keys = set_difference(all keys vkey[], matching_keys)
+ if (test_null_row(null_keys, min_row))
+ return TRUE
+ else
+ return FALSE
+ }
+ }
+
+ /* We should never get here. */
+ assert(FALSE)
+ return FALSE
+}
+
+
+
+3. Directions for improvement
+========================================================================
+
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
-=-=(Timour - Sun, 06 Dec 2009, 14:36)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.12919 2009-12-06 14:36:18.000000000 +0200
+++ /tmp/wklog.68.new.12919 2009-12-06 14:36:18.000000000 +0200
@@ -87,3 +87,8 @@
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
+8. [timour]
+ Consider that due to materialization, we already have a unique index
+on all columns <a_1,..., a_n>. We can use the first key part of this index
+over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
+creating the index rowid{a_i=v_i}.
-=-=(Timour - Fri, 04 Dec 2009, 14:04)=-=-
High-Level Specification modified.
--- /tmp/wklog.68.old.16724 2009-12-04 14:04:28.000000000 +0200
+++ /tmp/wklog.68.new.16724 2009-12-04 14:04:28.000000000 +0200
@@ -10,7 +10,8 @@
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
- (2) For each i: rowid{a_i is null} is the same for each tuple
+ (2) For each i: rowid{a_i is null} is the same for each tuple,
+ that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Version updated.
--- /tmp/wklog.68.old.5257 2009-12-04 11:27:11.000000000 +0200
+++ /tmp/wklog.68.new.5257 2009-12-04 11:27:11.000000000 +0200
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Category updated.
--- /tmp/wklog.68.old.5242 2009-12-04 11:27:02.000000000 +0200
+++ /tmp/wklog.68.new.5242 2009-12-04 11:27:02.000000000 +0200
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint
-=-=(Timour - Fri, 04 Dec 2009, 11:27)=-=-
Status updated.
--- /tmp/wklog.68.old.5242 2009-12-04 11:27:02.000000000 +0200
+++ /tmp/wklog.68.new.5242 2009-12-04 11:27:02.000000000 +0200
@@ -1 +1 @@
-Un-Assigned
+Assigned
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=68&nolimit=1
DESCRIPTION:
The goal of this task is to implement efficient execution of NOT IN
subquery predicates of the form:
<oe_1,...,oe_n> NOT IN <non_correlated subquery>
when either some oe_i, or some subqury result column contains NULLs.
The problem with such predicates is that it is possible to use index
lookups only when neither argument of the predicate contains NULLs.
If some argument contains a NULL, then due to NULL semantics, it
plays the role of a wildcard. If we were to use regular index lookups,
then we would get 'no match' for some outer tuple (thus the predicate
evaluates to FALSE), while the SQL semantics means 'partial match', and
the predicate should evaluate to NULL.
This task implements an efficient algorithm to compute such 'parial
matches', where a NULL matches any value.
HIGH-LEVEL SPECIFICATION:
Contents
========================================================================
1. Initial idea as proposed by Igor
2. Algorithm for IN execution with partial matching
3. Directions for improvement
1. Initial idea as proposed by Igor
========================================================================
For each left side tuple (v_1,...,v_n) we have to find the following
set of rowids for the temp table containing N rows as the result of
materialization of the subquery:
R= INTERSECT (rowid{a_i=v_i} UNION rowid{a_i is null} where i runs
trough all indexes from [1..n] such that v_i is not null.
Bear in mind the following specifics of this intersection:
(1) For each i: rowid{a_i=v_i} and rowid{a_i is null} are disjoint
(2) For each i: rowid{a_i is null} is the same for each tuple,
that is, this set is independent of the left-side tuples.
Due to (2) it makes sense to build rowid{a_i is null} only once.
A good representation for such sets would be bitmaps:
- it requires minimum memory: not more than N*n bits in total
- search of an element in a set is extremely cheap
Taken all above into account I could suggest the following algorithm
to build R:
Using indexes (read about them below) for each column participating
in the intersection, merge ordered sets rowid{a_i=v_i} in the
following manner.
If a rowid r has been encountered maximum in k sets
rowid{a_i1=v_i1},...,rowid(a_ik=v_ik),
then it has to be checked against all rowid{a_i=v_i} such that i is
not in {i1,...,ik}.
As soon as we fail to find r in one of these sets we discard it.
If r has been found in all of them then r belongs to the set R.
Here we use the property (1):
any r from rowid{a_i=v_i} UNION rowid{a_i is null} is either
belongs to rowid{a_i=v_i} or to rowid{a_i is null}. From this we can
infer that for any r from R indexes a_i can be uniquely divided into
two groups:
- one contains indexes a_i where r belongs to the sets rowid{a_i=v_i},
- the other contains indexes a_j such that r belongs to
rowid{a_j is null}.
Now let's talk how to get elements from rowid{a_i=v_i} in a sorted
order needed for the merge procedure. We could use BTREE indexes for
temp table. But they are rather expensive and take a lot of memory as
the are implemented with RB trees.
I would suggest creating for each column from the temporary table just
an array of rowids sorted by the value from column a.
Index lookup in such an array is cheap. It's also rather cheap to check
that the next rowid refers to a row with a different value in column a.
The array can be created on demand.
2. Algorithm for IN execution with partial matching
========================================================================
2.1 Below is shown the top-level algorithm to execute an IN predicate
with partial matching. This algorithm is essentially the implementation
of Item_subselect:exec().
int lookup_with_null_semantics(outer_ref[], mat_subquery)
{
if (index_lookup(outer_ref, mat_subquery)
return TRUE
else
{
/*
Check if there is a partial match (UNKNOWN) or no match (NULL).
*/
if (this is the first partial match)
{
vkey[] = build array of value keys for each NULL-able column
of mat_subquery.
nkey[] = build a bitmap NULL index for each column of mat_subquery
that contains NULLs
nonull_key = build a key over all non-NULL columns of mat_subquery
}
if (partial_match(outer_ref, vkey[], nkey[], nonull_key)
return UNKNOWN
else
return FALSE
}
}
2.2 The implementation of partial matching is as follows
/*
Assumptions:
- It has already been checked if there is a complete match by a
regular index lookup, and the test failed.
- It has already been checked if there is a complete NULL row,
and if there was we wouldn't call this function. Thus we assume
that there is no complete NULL row.
- Not all vidx_i are empty, but some can be empty. If all were empty,
then the only possibility for a match is a complete NULL row, which
we already checked.
@param outer_ref - the uter (left) IN argument.
@param vidx[] - array of value keys
Ordered sequences of rowids of the corresponding columns a_i, such
that all rowids in idx_i are the ones where column a_i contains some
value or NULL. Each idx_i is derived dynamically, for each different
left argument of an IN predicate.
@param nidx[] - array of NULL keys
Bitmpas, one per each column, where a bit is set if the corresponding
row has a NULL value for the corresponding column.
@nonull_key - the only key over all columns of the materialized subquery
that do not contain NULLs
@returns
@retval FALSE if there is no match
@retval TRUE if there is a partial match
*/
Boolean partial_match(outer_ref, vkey[], nkey[], nonull_key)
{
/* Set of the keys (columns) that form a partial match. */
Set matching_keys = {}
/* A subset of all keys that need to be checked for NULL matches. */
Set null_keys = {}
Int min_key /* Key that contains the current minimum position. */
Int min_row /* Current row number of min_key. */
Int cur_min_key, cur_min_row
PriorityQueue pq
if (nonull_key && ! nonull_key->lookup(outer_ref))
return FALSE
for (i = 1; i <= n; i++)
{
if (vkey[i] != nonull_key)
vkey[i].lookup(outer_ref)
if (! vkey[i].is_eof())
pq.insert(i)
}
/*
Not all value keys are empty, thus we don't have only NULL
keys. If we had, the only possible match is a NULL row, and
we cheked there is no such row, therefore the result is known
to be FALSE.
In fact this algorithm makes sense for at least two non-NULL
columns.
*/
assert(pq.elements > 1)
(min_key, min_row) = pq.pop()
matching_keys.add(min_key)
vkey[min_key].next()
if (! vkey[min_key].is_eof())
pq.insert(min_key)
while (TRUE)
{
(cur_min_key, cur_min_row) = pq.pop()
if (cur_min_row == min_row)
{
matching_keys.add(cur_min_key)
/* There cannot be a complete match, as we already checked for one. */
assert(matching_keys.elements < n)
}
else if (vkey[cur_min_key] == nonull_key)
{
/*
The non-NULL key has no corresponding NULL index, so we know for
sure that the row 'min_row' is not a match.
*/
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
else
{
assert(cur_min_row > min_row) /* Follows from the use of PQ. */
null_keys = set_difference(all keys vkey[], matching_keys)
/*
Check if all null_keys contain a NULL at row 'min_row'. The procedure
internally checks all keys in a special precomputed order. A prior
procedure determines an optimal order and a mapping idx_no -> idx_order
(encoded as an array).
This procedure makes sure not to match the non-NULL column.
*/
if (test_null_row(null_keys, min_row))
return TRUE
else
{
(min_key, min_row) = (cur_min_key, cur_min_row)
matching_keys = {min_key}
}
}
vkey[cur_min_key].next()
if (! vkey[cur_min_key].is_eof())
pq.insert(cur_min_key)
else if (vkey[cur_min_key] == nonull_key)
{
/*
If there can't be more matches for the nonull_key, we know for sure
there is no match, since there is no possible NULL match.
*/
return FALSE
}
if (pq.is_empty())
{
/* Check the last row of the last column in PQ for NULL matches. */
null_keys = set_difference(all keys vkey[], matching_keys)
if (test_null_row(null_keys, min_row))
return TRUE
else
return FALSE
}
}
/* We should never get here. */
assert(FALSE)
return FALSE
}
3. Directions for improvement
========================================================================
Other consideration that may be taken into account:
1. If columns a_j1,...,a_jm do not contain null values in the temporary
table at all and v_j1,...,v_jm cannot be null, create for these columns
only one index array (and of course do not create any bitmaps for them).
[done]
2. Consider the ratio d(a_i)=N'(a_i)/V(a_i), where N'(a_i) is the number
of rows, where a_i is not null and V(a_i) is the number of distinct
values for a_i excluding nulls.
If d(a_i) is close to N'(a_i) then do not create any index array: check
whether there is a match running through the records that have been
filtered in. Anyway if d(a_i) is close to N'(a_i) then the intersection
with rowid{a_i=v_i} will not reduce the number of remaining rowids
significantly.
In other words is V(a_i) exceeds some threshold there is no sense to
create an index for a_i.
If additionally N-N'(a_i) is small do not create a bitmap for this
column either.
3. If for a column a_i d(a_i) is not close to N'(a_i), but N-N'(a_i) is
small a sorted array of rowids from the set rowid{a_i is null} can be
used instead of a bitmap.
4. We always have a match if R0= INTERSECT rowid{a_i is null} is not
empty. Here i runs through all indexes from [1..n] such that v_i is not
null. For a given subset of columns this fact has to be checked only
once. It can be easily done with bitmap intersection.
5. If v1,...,vn never can be a null, then indexes (sorted arrays) can be
created only for rows with nulls.
6. If v1,...,vn never can be a null and number of rows with nulls is
small do not create indexes and do not create bitmaps.
7. If you get a row with nulls in all columns stop filling the temporary
table and return UNKNOWN for any tuple <v1,...,vn>.
[This is wrong, because if we don't fill the whole temp table, there may
be some tuple(s) that would match some outer tuple. In such cases, if we
stop filling the temp table, we would miss a TRUE result. Having a partial
match doesn't preclude us from having a complete match].
8. [timour]
Consider that due to materialization, we already have a unique index
on all columns <a_1,..., a_n>. We can use the first key part of this index
over column a_1, instead of the index rowid{a_i=v_i}. Thus we can avoid
creating the index rowid{a_i=v_i}.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)
1
0
Re: [Maria-developers] request for advice on managing a mysql patch in bzr
by MARK CALLAGHAN 27 Feb '10
by MARK CALLAGHAN 27 Feb '10
27 Feb '10
On Fri, Feb 26, 2010 at 4:19 PM, Sergei Golubchik <serg(a)askmonty.org> wrote:
> Eh, you omitted lots of crucial details here, but luckily you mentioned
> them in your email to internals, so I was able to decrypt the above :)
>
> Anyway, what happens - you are not merging one mysql branch into
> another. You merge two completely unrelated repositories. And bzr see
> that both the source and the target repositories have the files with the
> name "configure.in" for example - and this is obviously a conflict, as
> two different files have the same name, bzr cannot merge that, it's not
> the same file with the different content, they are two different files.
> Two different file keys, at least.
>
> I seriously doubt that git would handle that situation the way you want.
> By all means, bitkeeper would not - it would not even allow you to pull
> from an unrelated repository, I'm a bit surprised that bzr let you do
> that.
>
> I'm afraid your only option is to replay your changes on a mysql bzr
> branch.
I ended up doing that. Thanks for the advice.
https://code.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1
--
Mark Callaghan
mdcallag(a)gmail.com
1
0
Re: [Maria-developers] request for advice on managing a mysql patch in bzr
by MARK CALLAGHAN 26 Feb '10
by MARK CALLAGHAN 26 Feb '10
26 Feb '10
On Fri, Feb 26, 2010 at 6:36 AM, Sergei Golubchik <serg(a)askmonty.org> wrote:
> When done, you'll still have your patches on top of the tree, and you
> can trivially extract them (e.g. with bzr log -p) and publish them.
> If you merge instead of rebasing, you'll have old patches in the tree
> and merge changes spread over many merge changesets.
There apparently isn't a trivial tool to import commits between two
branches. Doing something like this does way more than apply the diff
for that one change and copy the commit mesage.
bzr merge -c revno:2 /s/bzr/fb/5.1
--
Mark Callaghan
mdcallag(a)gmail.com
1
0
Re: [Maria-developers] request for advice on managing a mysql patch in bzr
by MARK CALLAGHAN 26 Feb '10
by MARK CALLAGHAN 26 Feb '10
26 Feb '10
On Fri, Feb 26, 2010 at 6:36 AM, Sergei Golubchik <serg(a)askmonty.org> wrote:
> Hi, MARK!
>
> On Feb 25, MARK CALLAGHAN wrote:
>>
>> I know this question is more about official MySQL than MariaDB, but it
>> is also about sharing code we write at Facebook and all of the patches
>> we publish can be reused by others as long as others accept BSD
>> contributions.
>>
>> I am getting ready to publish facebook patches for mysql 5.1. We use
>> git internally and are based on MySQL 5.1.44. We intend to stay
>> current with 5.1 releases. What is the best way to manage the code in
>> launchpad across new releases of official MySQL. My current plan is:
>>
>> 1) import MySQL 5.1.44, commit 2) apply patches from my git repo to my
>> bzr repo, commit after each one 3) publish
>>
>> But there will soon be a release of 5.1.45 or 5.1.46 and I don't want
>> to repeat the steps above each time that happens. Do I just publish a
>> large patch for 5.1.44 -> 5.1.4X and get on with my work? Is there a
>> better way?
>
> The least labor-intensive path, I think, could be something like
> Branch mysql-5.1 tree (say, up to tag:mysql-5.1.44), apply your patches
> and commit. One commit per logical patch, as usual.
>
> Then when 5.1.45 comes out you rebase (not merge) your tree to
> tag:mysql-5.1.45. Some merges you'll need to resolve manually, others
> bzr can handle automatically.
>
> When done, you'll still have your patches on top of the tree, and you
> can trivially extract them (e.g. with bzr log -p) and publish them.
> If you merge instead of rebasing, you'll have old patches in the tree
> and merge changes spread over many merge changesets.
That is the advice I need. Thanks.
--
Mark Callaghan
mdcallag(a)gmail.com
1
0
[Maria-developers] Rev 11: Merge. in file:///Users/hakan/work/monty_program/mariadb-tools/
by Hakan Kuecuekyilmaz 26 Feb '10
by Hakan Kuecuekyilmaz 26 Feb '10
26 Feb '10
At file:///Users/hakan/work/monty_program/mariadb-tools/
------------------------------------------------------------
revno: 11 [merge]
revision-id: hakan(a)askmonty.org-20100226170755-uucbsqe4n60b50c1
parent: hakan(a)askmonty.org-20100226162413-118168qx5c9c02ml
parent: knielsen@hasky-20100223074854-mg13sm3jnan2s51a
committer: Hakan Kuecuekyilmaz <hakan(a)askmonty.org>
branch nick: mariadb-tools
timestamp: Fri 2010-02-26 17:07:55 +0000
message:
Merge.
modified:
buildbot/maria-master.cfg mariamaster.cfg-20091218103450-cvifjz3i70oerkej-1
=== modified file 'buildbot/maria-master.cfg'
--- a/buildbot/maria-master.cfg 2010-02-02 15:10:12 +0000
+++ b/buildbot/maria-master.cfg 2010-02-23 07:48:54 +0000
@@ -113,7 +113,8 @@
"lp:~maria-captains/maria/maria-5.2-merge-5.1" : "maria-5.2-merge-5.1",
"lp:~maria-captains/maria/5.2-dsmrr" : "5.2-dsmrr",
"lp:~maria-captains/maria/5.3" : "5.3",
- "lp:~maria-captains/maria/5.3-sj-subqueries" : "5.3-sj-subqueries"
+ "lp:~maria-captains/maria/5.3-sj-subqueries" : "5.3-sj-subqueries",
+ "lp:~maria-captains/maria/5.3-subqueries" : "5.3-subqueries"
}
mailSource = mail.BzrLaunchpadEmailMaildirSource("/var/lib/buildbot/Maildir",
branchMap=myBranchMap)
@@ -154,7 +155,7 @@
"mariadb-5.1-monty", "mariadb-5.2-monty",
"mariadb-5.1-knielsen","5.1-release", "5.1-merge",
"maria-5.1-wl24", "maria-5.1-vcol","maria-5.1-wl36","mysql-5.1-mwl36",
- "5.1.39-oqgraph", "5.2-dsmrr", "5.3", "5.3-sj-subqueries"
+ "5.1.39-oqgraph", "5.2-dsmrr", "5.3", "5.3-sj-subqueries", "5.3-subqueries"
],
treeStableTimer=60, # 1 sec for bzr
builderNames=["centos5-debug", "hardy-x86-rtai", "hardy-amd64-makedist",
@@ -531,7 +532,7 @@
"slavename": "adutko-centos5-amd64",
"builddir": "centos5-amd64-minimal",
"factory": f_minimal,
- "category": "main",
+ "category": "experimental",
}
f_win32_rel_nmake = factory.BuildFactory()
1
0
[Maria-developers] Rev 10: Do one thing, and do that one thing proper. - The Unix Way. in file:///Users/hakan/work/monty_program/mariadb-tools/
by Hakan Kuecuekyilmaz 26 Feb '10
by Hakan Kuecuekyilmaz 26 Feb '10
26 Feb '10
At file:///Users/hakan/work/monty_program/mariadb-tools/
------------------------------------------------------------
revno: 10
revision-id: hakan(a)askmonty.org-20100226162413-118168qx5c9c02ml
parent: hakan(a)askmonty.org-20100219052704-8nnlzsbu0nslnxba
committer: Hakan Kuecuekyilmaz <hakan(a)askmonty.org>
branch nick: mariadb-tools
timestamp: Fri 2010-02-26 16:24:13 +0000
message:
Do one thing, and do that one thing proper. - The Unix Way.
We are running the benchmark for one source repository now. Also we run each test three times, so
that peaks and glitches are balanced out.
=== modified file 'sysbench/run-sysbench.sh'
--- a/sysbench/run-sysbench.sh 2010-02-19 05:27:04 +0000
+++ b/sysbench/run-sysbench.sh 2010-02-26 16:24:13 +0000
@@ -2,8 +2,10 @@
#
# Run sysbench tests with MariaDB and MySQL
#
-# Note: Do not run this script with root privileges.
-# We use killall -9, which can cause severe side effects!
+# Notes:
+# * Do not run this script with root privileges. We use
+# killall -9, which can cause severe side effects!
+# * By bzr pull we mean bzr merge --pull
#
# Hakan Kuecuekyilmaz <hakan at askmonty dot org> 2010-02-19.
#
@@ -19,6 +21,29 @@
exit 1
fi
+if [ $# != 3 ]; then
+ echo '[ERROR]: Please provide exactly three options.'
+ echo " Example: $0 [pull | no-pull] [/path/to/bzr/repo] [name]"
+ echo " $0 pull ${HOME}/work/monty_program/maria-local-master MariaDB"
+
+ exit 1
+else
+ PULL="$1"
+ LOCAL_MASTER="$2"
+ PRODUCT="$3"
+fi
+
+#
+# Binaries.
+#
+MYSQLADMIN='client/mysqladmin'
+
+#
+# Adjust the following paths according to your installation.
+#
+SYSBENCH='/usr/local/bin/sysbench'
+BZR='/usr/local/bin/bzr'
+
#
# Variables.
#
@@ -27,32 +52,42 @@
MY_SOCKET="${TEMP_DIR}/mysql.sock"
MYSQLADMIN_OPTIONS="--no-defaults -uroot --socket=$MY_SOCKET"
MYSQL_OPTIONS="--no-defaults \
+ --datadir=$DATA_DIR \
+ --language=./sql/share/english \
+ --max_connections=256 \
+ --query_cache_size=0 \
+ --query_cache_type=0 \
--skip-grant-tables \
- --language=./sql/share/english \
- --datadir=$DATA_DIR \
- --tmpdir=$TEMP_DIR \
--socket=$MY_SOCKET \
--table_open_cache=512 \
--thread_cache=512 \
- --query_cache_size=0 \
- --query_cache_type=0 \
+ --tmpdir=$TEMP_DIR \
+ --innodb_additional_mem_pool_size=32M \
+ --innodb_buffer_pool_size=1024M \
+ --innodb_data_file_path=ibdata1:32M:autoextend \
--innodb_data_home_dir=$DATA_DIR \
- --innodb_data_file_path=ibdata1:128M:autoextend \
- --innodb_log_group_home_dir=$DATA_DIR \
- --innodb_buffer_pool_size=1024M \
- --innodb_additional_mem_pool_size=32M \
- --innodb_log_file_size=256M \
- --innodb_log_buffer_size=16M \
+ --innodb_doublewrite=0 \
--innodb_flush_log_at_trx_commit=1 \
+ --innodb_flush_method=O_DIRECT \
--innodb_lock_wait_timeout=50 \
- --innodb_doublewrite=0 \
- --innodb_flush_method=O_DIRECT \
- --innodb_thread_concurrency=0 \
- --innodb_max_dirty_pages_pct=80"
+ --innodb_log_buffer_size=16M \
+ --innodb_log_file_size=256M \
+ --innodb_log_group_home_dir=$DATA_DIR \
+ --innodb_max_dirty_pages_pct=80 \
+ --innodb_thread_concurrency=0"
+# Number of threads we run sysbench with.
NUM_THREADS="1 4 8 16 32 64 128"
+
+# The table size we use for sysbench.
TABLE_SIZE=2000000
+
+# The run time we use for sysbench.
RUN_TIME=300
+
+# How many times we run each test.
+LOOP_COUNT=3
+
SYSBENCH_TESTS="delete.lua \
insert.lua \
oltp_complex_ro.lua \
@@ -61,6 +96,7 @@
select.lua \
update_index.lua \
update_non_index.lua"
+
SYSBENCH_OPTIONS="--oltp-table-size=$TABLE_SIZE \
--max-time=$RUN_TIME \
--max-requests=0 \
@@ -68,123 +104,86 @@
--mysql-user=root \
--mysql-engine-trx=yes"
-PRODUCTS='MariaDB MySQL'
-
# Timeout in seconds for waiting for mysqld to start.
TIMEOUT=100
#
# Files
#
-MARIADB_BUILD_LOG='/tmp/mariadb_build.log'
-MYSQL_BUILD_LOG='/tmp/mysql_build.log'
+BUILD_LOG="/tmp/${PRODUCT}_build.log"
#
# Directories.
#
BASE="${HOME}/work"
-MARIADB_LOCAL_MASTER="${BASE}/monty_program/maria-local-master"
-MARIADB_WORK="${BASE}/monty_program/maria"
-MYSQL_LOCAL_MASTER="${BASE}/mysql/mysql-server-local-master"
-MYSQL_WORK="${BASE}/mysql/mysql-server"
TEST_DIR="${BASE}/monty_program/sysbench/sysbench/tests/db"
RESULT_DIR="${BASE}/sysbench-results"
-
-#
-# Binaries.
-#
-MYSQLADMIN='./client/mysqladmin'
-SYSBENCH='/usr/local/bin/sysbench'
-BZR='/usr/local/bin/bzr'
-
-#
-# Refresh repositories.
-#
-echo "[$(date "+%Y-%m-%d %H:%M:%S")] Refreshing source repositories."
-rm -rf $MARIADB_WORK
-if [ ! -d $MARIADB_LOCAL_MASTER ]; then
- echo "[ERROR]: Local master of MariaDB does not exist."
- echo " Please make a initial branch from lp:maria"
- echo " Exiting."
- exit 1
-else
- cd $MARIADB_LOCAL_MASTER
+WORK_DIR='/tmp'
+
+if [ ! -d $LOCAL_MASTER ]; then
+ echo "[ERROR]: Supplied local master $LOCAL_MASTER does not exists."
+ echo " Please provide a valid bzr repository."
+ echo " Exiting."
+ exit 1
+fi
+
+#
+# Refresh repositories, if requested.
+#
+if [ x"$PULL" = x"pull" ]; then
+ echo "[$(date "+%Y-%m-%d %H:%M:%S")] Refreshing source repositories."
+
+ cd $LOCAL_MASTER
echo "Pulling latest MariaDB sources."
- $BZR pull
- if [ $? != 0 ]; then
- echo "[ERROR]: $BZR pull for $MARIADB_LOCAL_MASTER failed"
- echo " Please check your bzr setup"
- echo " Exiting."
- exit 1
- fi
-
- echo "Branching MariaDB working directory."
- $BZR branch $MARIADB_LOCAL_MASTER $MARIADB_WORK
- if [ $? != 0 ]; then
- echo "[ERROR]: $BZR branch of $MARIADB_LOCAL_MASTER failed"
- echo " Please check your bzr setup"
- echo " Exiting."
- exit 1
- fi
-fi
-
-rm -rf $MYSQL_WORK
-if [ ! -d $MYSQL_LOCAL_MASTER ]; then
- echo "[ERROR]: Local master of MySQL does not exist."
- echo " Please make a initial branch from lp:mysql-server"
- echo " Exiting."
- exit 1
-else
- cd $MYSQL_LOCAL_MASTER
- echo "Pulling latest MySQL sources."
- $BZR pull
- if [ $? != 0 ]; then
- echo "[ERROR]: $BZR pull for $MYSQL_LOCAL_MASTER failed"
- echo " Please check your bzr setup"
- echo " Exiting."
- exit 1
- fi
-
- echo "Branching MySQL working directory."
- $BZR branch $MYSQL_LOCAL_MASTER $MYSQL_WORK
- if [ $? != 0 ]; then
- echo "[ERROR]: $BZR branch of $MYSQL_LOCAL_MASTER failed"
- echo " Please check your bzr setup"
- echo " Exiting."
- exit 1
- fi
-fi
-
-echo "[$(date "+%Y-%m-%d %H:%M:%S")] Done refreshing source repositories."
-
-
-#
-# TODO: Add platform detection and choose proper build script.
-#
-echo "[$(date "+%Y-%m-%d %H:%M:%S")] Starting to compile."
-
-echo "[$(date "+%Y-%m-%d %H:%M:%S")] Compiling MariaDB."
-cd $MARIADB_WORK
-BUILD/compile-amd64-max > $MARIADB_BUILD_LOG 2>&1
-if [ $? != 0 ]; then
- echo "[ERROR]: Build of $MARIADB_WORK failed"
- echo " Please check the log at $MARIDB_BUILD_LOG"
- echo " Exiting."
- exit 1
-fi
-echo "[$(date "+%Y-%m-%d %H:%M:%S")] Finnished compiling MariaDB."
-
-echo "[$(date "+%Y-%m-%d %H:%M:%S")] Compiling MySQL."
-cd $MYSQL_WORK
-BUILD/compile-amd64-max > $MYSQL_BUILD_LOG 2>&1
-if [ $? != 0 ]; then
- echo "[ERROR]: Build of $MYSQL_WORK failed"
- echo " Please check the log at $MYSQL_BUILD_LOG"
- echo " Exiting."
- exit 1
-fi
-echo "[$(date "+%Y-%m-%d %H:%M:%S")] Finnished compiling MySQL."
-echo "[$(date "+%Y-%m-%d %H:%M:%S")] Finnished compiling."
+ $BZR merge --pull
+ if [ $? != 0 ]; then
+ echo "[ERROR]: $BZR pull for $LOCAL_MASTER failed"
+ echo " Please check your bzr setup and/or repository"
+ echo " Exiting."
+ exit 1
+ fi
+
+ echo "[$(date "+%Y-%m-%d %H:%M:%S")] Done refreshing source repositories."
+fi
+
+cd $WORK_DIR
+TEMP_DIR=$(mktemp -d)
+if [ $? != 0 ]; then
+ echo "[ERROR]: mktemp in $WORK_DIR failed."
+ echo 'Exiting.'
+
+ exit 1
+fi
+
+#
+# bzr export refuses to export to an existing directory,
+# therefore we use an extra build/ directory.
+#
+echo "Exporting from $LOCAL_MASTER to ${TEMP_DIR}/build"
+$BZR export --format=dir ${TEMP_DIR}/build $LOCAL_MASTER
+if [ $? != 0 ]; then
+ echo '[ERROR]: bzr export failed.'
+ echo 'Exiting.'
+
+ exit 1
+fi
+
+#
+# Compile sources.
+# TODO: Add platform detection and choose proper build script accordingly.
+#
+echo "[$(date "+%Y-%m-%d %H:%M:%S")] Starting to compile $PRODUCT."
+
+cd ${TEMP_DIR}/build
+BUILD/compile-amd64-max > $BUILD_LOG 2>&1
+if [ $? != 0 ]; then
+ echo "[ERROR]: Build of $PRODUCT failed"
+ echo " Please check your log at $BUILD_LOG"
+ echo " Exiting."
+ exit 1
+fi
+
+echo "[$(date "+%Y-%m-%d %H:%M:%S")] Finnished compiling $PRODUCT."
#
# Go to work.
@@ -194,78 +193,79 @@
#
# Prepare results directory.
#
-if [ ! -d $RESULT_DIRS ]; then
- echo "[NOTE]: $RESULT_DIRS did not exist."
+if [ ! -d $RESULT_DIR ]; then
+ echo "[NOTE]: $RESULT_DIR did not exist."
echo " We are creating it for you!"
- mkdir $RESULT_DIRS
+ mkdir $RESULT_DIR
fi
TODAY=$(date +%Y-%m-%d)
mkdir ${RESULT_DIR}/${TODAY}
-
-for PRODUCT in $PRODUCTS; do
- mkdir ${RESULT_DIR}/${TODAY}/${PRODUCT}
-
- killall -9 mysqld
- rm -rf $DATA_DIR
- rm -f $MY_SOCKET
- mkdir $DATA_DIR
-
- if [ x"$PRODUCT" = x"MariaDB" ];then
- cd $MARIADB_WORK
- else
- cd $MYSQL_WORK
- fi
-
- sql/mysqld $MYSQL_OPTIONS &
-
- j=0
- STARTED=-1
- while [ $j -le $TIMEOUT ]
- do
- $MYSQLADMIN $MYSQLADMIN_OPTIONS ping > /dev/null 2>&1
- if [ $? = 0 ]; then
- STARTED=0
-
- break
- fi
-
- sleep 1
- j=$(($j + 1))
- done
-
- if [ $STARTED != 0 ]; then
- echo '[ERROR]: Start of mysqld failed.'
- echo ' Please check your error log.'
- echo ' Exiting.'
-
- exit 1
- fi
-
- for SYSBENCH_TEST in $SYSBENCH_TESTS; do
- mkdir ${RESULT_DIR}/${TODAY}/${PRODUCT}/${SYSBENCH_TEST}
-
- for THREADS in $NUM_THREADS; do
- THIS_RESULT_DIR="${RESULT_DIR}/${TODAY}/${PRODUCT}/${SYSBENCH_TEST}/${THREADS}"
- mkdir $THIS_RESULT_DIR
- echo "[$(date "+%Y-%m-%d %H:%M:%S")] Running $SYSBENCH_TEST with $THREADS threads for $PRODUCT"
-
- $MYSQLADMIN $MYSQLADMIN_OPTIONS -f drop sbtest
- $MYSQLADMIN $MYSQLADMIN_OPTIONS create sbtest
- if [ $? != 0 ]; then
- echo "[ERROR]: Create of sbtest database failed"
- echo " Please check your setup."
- echo " Exiting"
- exit 1
- fi
-
- SYSBENCH_OPTIONS="$SYSBENCH_OPTIONS --num-threads=$THREADS --test=${TEST_DIR}/${SYSBENCH_TEST}"
- $SYSBENCH $SYSBENCH_OPTIONS prepare
- $SYSBENCH $SYSBENCH_OPTIONS run > ${THIS_RESULT_DIR}/result.txt 2>&1
-
- done
- done
+mkdir ${RESULT_DIR}/${TODAY}/${PRODUCT}
+
+killall -9 mysqld
+rm -rf $DATA_DIR
+rm -f $MY_SOCKET
+mkdir $DATA_DIR
+
+sql/mysqld $MYSQL_OPTIONS &
+
+j=0
+STARTED=-1
+while [ $j -le $TIMEOUT ]
+ do
+ $MYSQLADMIN $MYSQLADMIN_OPTIONS ping > /dev/null 2>&1
+ if [ $? = 0 ]; then
+ STARTED=0
+
+ break
+ fi
+
+ sleep 1
+ j=$(($j + 1))
+done
+
+if [ $STARTED != 0 ]; then
+ echo '[ERROR]: Start of mysqld failed.'
+ echo ' Please check your error log.'
+ echo ' Exiting.'
+
+ exit 1
+fi
+
+for SYSBENCH_TEST in $SYSBENCH_TESTS
+ do
+ mkdir ${RESULT_DIR}/${TODAY}/${PRODUCT}/${SYSBENCH_TEST}
+
+ for THREADS in $NUM_THREADS
+ do
+ THIS_RESULT_DIR="${RESULT_DIR}/${TODAY}/${PRODUCT}/${SYSBENCH_TEST}/${THREADS}"
+ mkdir $THIS_RESULT_DIR
+ echo "[$(date "+%Y-%m-%d %H:%M:%S")] Running $SYSBENCH_TEST with $THREADS threads and $LOOP_COUNT iterations for $PRODUCT" | tee ${THIS_RESULT_DIR}/results.txt
+ echo '' >> ${THIS_RESULT_DIR}/results.txt
+
+ k=0
+ while [ $k -lt $LOOP_COUNT ]
+ do
+ $MYSQLADMIN $MYSQLADMIN_OPTIONS -f drop sbtest
+ $MYSQLADMIN $MYSQLADMIN_OPTIONS create sbtest
+ if [ $? != 0 ]; then
+ echo "[ERROR]: Create of sbtest database failed"
+ echo " Please check your setup."
+ echo " Exiting"
+ exit 1
+ fi
+
+ SYSBENCH_OPTIONS="$SYSBENCH_OPTIONS --num-threads=$THREADS --test=${TEST_DIR}/${SYSBENCH_TEST}"
+ $SYSBENCH $SYSBENCH_OPTIONS prepare
+ $SYSBENCH $SYSBENCH_OPTIONS run > ${THIS_RESULT_DIR}/result${k}.txt 2>&1
+
+ grep "write requests:" ${THIS_RESULT_DIR}/result${k}.txt | awk '{ print $4 }' | sed -e 's/(//' >> ${THIS_RESULT_DIR}/results.txt
+
+ k=$(($k + 1))
+ done
+ done
done
#
1
0
Re: [Maria-developers] request for advice on managing a mysql patch in bzr
by MARK CALLAGHAN 26 Feb '10
by MARK CALLAGHAN 26 Feb '10
26 Feb '10
On Fri, Feb 26, 2010 at 1:30 AM, Sergei Golubchik <serg(a)askmonty.org> wrote:
> Hi, MARK!
>
> On Feb 25, MARK CALLAGHAN wrote:
>>
>> One other comment. I don't think the official releases are in
>> launchpad. If they were I imagine this would be easier to do. Of
>> course, anyone could republish the official releases there.
>
> Official MySQL releases ?
>
> What do you mean - mysql-5.1 tree is on launchpad, all releases are
> tagged, you can branch up to any tag to get a release.
> What am I missing ?
>
I missed that there were tags. I know that now -- Harrison explained it to me.
--
Mark Callaghan
mdcallag(a)gmail.com
1
0