Hi! Part 2 (of 3) Sorry got guests for birthday party that I have to make food for, so I have to finnish the last part on Sunday.
"Sergey" == Sergey Petrunya <psergey@askmonty.org> writes:
<cut>
--- 5.3-noc/sql/sql_join_cache.cc 2011-02-17 15:42:51.000000000 +0300
@@ -203,11 +159,34 @@
void JOIN_CACHE::calc_record_fields() { + JOIN_TAB *tab; + + if (prev_cache) + tab= prev_cache->join_tab; + else + { + if (join_tab->bush_root_tab) + { + /* + If the tab we're attached to is inside an SJM-nest, start from the + first tab in that SJM nest + */ + tab= join_tab->bush_root_tab->bush_children->start; + } + else + { + /* + The tab we're attached to is not inside an SJM-nest. Start from the + first non-const table. + */ + tab= join->join_tab + join->const_tables; + } + } + start_tab= tab; + if (start_tab->bush_children) + start_tab= start_tab->bush_children->start;
Is it possible that the new start_tab also have bush_childrens or is this a case for if (prev_cache). If it's the later (conclusion we took on IRC), move the if to the first if part. To make this clear, add at end: DBUG_ASSERT(!start_tab->bush_children);
+ + tab= start_tab;
Above code is a simpler if you use start_tab in all the places and only last set tab= start_tab
@@ -524,7 +507,8 @@
/* Now create local fields that are used to build ref for this key access */ copy= field_descr+flag_fields; - for (tab= join_tab-tables; tab; tab= get_next_table(tab)) + for (tab= start_tab; tab != join_tab; tab= next_linear_tab(join, tab, FALSE)) + //for (tab= join_tab-tables; tab; tab= get_next_table(tab))
Remove comment
@@ -2155,8 +2140,12 @@ }
/* Prepare to retrieve all records of the joined table */ - if ((error= join_tab_scan->open())) + if ((error= join_tab_scan->open())) goto finish; /* psergey-note: if this returns error, we will assert in net_send_statement() */
Remove psgergey-note and do a proper comment before the goto.
+void save_or_restore_used_tabs(JOIN_TAB *join_tab, bool save) +{ + JOIN_TAB *first= join_tab->bush_root_tab? + join_tab->bush_root_tab->bush_children->start : + join_tab->join->join_tab + join_tab->join->const_tables; + + for (JOIN_TAB *tab= join_tab-1; tab != first && !tab->cache; tab--) + { + if (tab->bush_children) + { + for (JOIN_TAB *child= tab->bush_children->start; + child != tab->bush_children->end; + child++) + { + if (save) + child->table->status= child->status;
an 'else' is missing here. Please spend a short time trying to create a test case that catches this bug, as it will ensure that this part of the code is properly tested...
+ { + tab->status= tab->table->status; + tab->table->status= 0; + } + } + }
<cut>
diff -urN --exclude='.*' 5.3-noc/sql/sql_join_cache.h maria-5.3-subqueries-r36-noc/sql/sql_join_cache.h --- 5.3-noc/sql/sql_join_cache.h 2011-02-17 15:42:51.000000000 +0300 +++ maria-5.3-subqueries-r36-noc/sql/sql_join_cache.h 2011-02-21 18:15:06.000000000 +0300
<cut>
@@ -647,7 +650,7 @@ buff= 0; }
- JOIN_TAB *get_next_table(JOIN_TAB *tab); + // JOIN_TAB *get_next_table(JOIN_TAB *tab);
Remove comment
+++ maria-5.3-subqueries-r36-noc/sql/sql_select.cc 2011-02-22 17:19:29.000000000 +0300
@@ -1036,50 +1037,72 @@ /* Perform the optimization on fields evaluation mentioned above for all on expressions. + */ + { + List_iterator<JOIN_TAB_RANGE> it(join_tab_ranges); + JOIN_TAB_RANGE *jt_range;
Add comment: /* Skip the const tables from the upper level JOIN_TAB's. We don't have to do this for SJM nests as the const tables are pulled out from these and added to the upper level JOIN_TAB. */
+ uint first_tab_offs= const_tables;
<cut>
/* Perform the optimization on fields evaliation mentioned above for all used ref items. */ - for (JOIN_TAB *tab= join_tab + const_tables; tab < join_tab + tables; tab++) + //for (JOIN_TAB *tab= join_tab + const_tables; tab < join_tab + tables; tab++) + //{
Remove comments
{ - for (uint i=0; i < tab->ref.key_parts; i++) + List_iterator<JOIN_TAB_RANGE> it(join_tab_ranges); + JOIN_TAB_RANGE *jt_range; + uint first_tab_offs= const_tables; + while ((jt_range= it++)) { + for (JOIN_TAB *tab= jt_range->start + first_tab_offs; + tab < jt_range->end; tab++) + { + for (uint i=0; i < tab->ref.key_parts; i++) + { +
This is a big change that causes a lot of code to be re-indented which will cause merge conflicts. The above also exposes a little too much how the tables are organized. Please replace loop with either: for (tab= first_linear_tab(...) ; tab ; tab= next_linear_tab(...)) Or add a set of similar functions to loop over join_tab_ranges: This also simplifies the usage of-
@@ -1352,8 +1375,11 @@ */ if (need_tmp || select_distinct || group_list || order) { - for (uint i = const_tables; i < tables; i++) - join_tab[i].table->prepare_for_position();
+ for (uint i= 0; i < tables; i++) + { + if (!(table[i]->map & const_table_map)) + table[i]->prepare_for_position(); + } }
Did not understand the above change as const tables should still be first in join_tab[] and all tables in 'table' should have a correspoding join_tab element. The fact that we added some sjm tables, should not change this, should it? After discussion on IRC: There is no mapping between join->join_tab[] and join->table[] anymore. join->table contains all 'real tables' while join_tab also contains sjm (materialized) tables. This makes me worried that one may accidently use 'join->tables' wrongly when they mean join->top_jtrange_tables. Suggestion. Rename join->tables to 'join->real_tables' Rename join->top_jtrange_tables to 'join->join_tab_tables' or Rename join->tables to 'join->table_count' Rename join->top_jtrange_tables to 'join->join_tab_count'
DBUG_EXECUTE("info",TEST_join(this);); @@ -1604,9 +1630,12 @@ if (conds) conds= conds->transform(&Item::expr_cache_insert_transformer, (uchar*) thd); + for (JOIN_TAB *tab= first_linear_tab(this, TRUE); + tab; + tab= next_linear_tab(this, tab, TRUE)) + //for (JOIN_TAB *tab= join_tab + const_tables; + // tab < join_tab + tables ; + // tab++)
Remove comment
@@ -2799,10 +2828,12 @@ JOIN_TAB *stat_vector[MAX_TABLES+1]; DBUG_ENTER("make_join_statistics");
+ LINT_INIT(table); /* inited in all loops */ table_count=join->tables; - stat=(JOIN_TAB*) join->thd->calloc(sizeof(JOIN_TAB)*table_count); + + stat=(JOIN_TAB*) join->thd->calloc(sizeof(JOIN_TAB)*(table_count)); stat_ref=(JOIN_TAB**) join->thd->alloc(sizeof(JOIN_TAB*)*MAX_TABLES); - table_vector=(TABLE**) join->thd->alloc(sizeof(TABLE*)*(table_count*2)); + table_vector=(TABLE**) join->thd->alloc(sizeof(TABLE*)*((table_count)*2));
Don't understand the above change. please revert it. (no need to have () around table_count)
if (!stat || !stat_ref || !table_vector) DBUG_RETURN(1); // Eom /* purecov: inspected */
@@ -2835,9 +2866,10 @@ table->reginfo.join_tab=s; table->reginfo.not_exists_optimize=0; bzero((char*) table->const_key_parts, sizeof(key_part_map)*table->s->keys); - all_table_map|= table->map; + all_table_map|= s->table->map;
Revert, as we here know that s->table == table (assignment done a couple of lines before)
s->join=join; s->info=0; // For describe + s->bush_root_tab= NULL;
Remove the above two lines as all elements in s are guaranteed to be 0
s->dependent= tables->dep_tables; s->key_dependent= 0;
You can remove the above line too
@@ -2998,6 +3033,9 @@ { table=s->table;
+ if (table->is_filled_at_execution()) + continue; + /* If equi-join condition by a key is null rejecting and after a substitution of a const table the key value happens to be null @@ -3155,15 +3193,30 @@
for (s=stat ; s < stat_end ; s++) { + s->startup_cost= 0; if (s->type == JT_SYSTEM || s->type == JT_CONST) { /* Only one matching row */ - s->found_records=s->records=s->read_time=1; s->worst_seeks=1.0; + s->found_records= s->records= 1; + s->read_time=1.0; + s->worst_seeks=1.0; continue; } /* Approximate found rows and time to read them */ - s->found_records=s->records=s->table->file->stats.records; - s->read_time=(ha_rows) s->table->file->scan_time(); + + if (s->table->is_filled_at_execution()) + { + get_delayed_table_estimates(s->table, &s->records, &s->read_time, + &s->startup_cost); + s->found_records= s->records;
Add a comment why the following is set as we don't set it below.
+ table->quick_condition_rows=s->records; + } + else + { + s->found_records= s->records= s->table->file->stats.records; + s->read_time= s->table->file->scan_time(); + } +
+JOIN_TAB *first_linear_tab(JOIN *join, bool after_const_tables) +{ + JOIN_TAB *first= join->join_tab; + if (after_const_tables) + first+= join->const_tables; + if (first < join->join_tab + join->top_jtrange_tables) + return first;
Add comment: /* All tables where const tables */
+ return NULL; +}
+ + +/* + A helper function to loop over all join's join_tab in sequential fashion + + DESCRIPTION + Depending on include_bush_roots parameter, JOIN_TABS that represent + SJM-scan/lookups are produced or omitted. + + SJM-Bush children are returned right after (or in place of) their container + join tab (TODO: does anybody depend on this? A: make_join_readinfo() seems + to.)
Add comment: If one is scanning with include_bush_roots, subquent calls to next_linear_tab() will return: ot1 ot2 sjm it1 it2 it3 ot3 .... When include_bush_rooots=FALSE we get: ot1 ot2 it1 it2 it3 ot3 .... (note the lack of sjm)
+*/ + +JOIN_TAB *next_linear_tab(JOIN* join, JOIN_TAB* tab, bool include_bush_roots) +{ + if (include_bush_roots && tab->bush_children) + return tab->bush_children->start; + + DBUG_ASSERT(!tab->last_leaf_in_bush || tab->bush_root_tab);
+ if (tab->last_leaf_in_bush) + tab= tab->bush_root_tab; + + if (tab->bush_root_tab) + return ++tab; + + if (++tab == join->join_tab + join->top_jtrange_tables) + return NULL; + + if (!include_bush_roots && tab->bush_children) + tab= tab->bush_children->start; + + return tab; +}
Here is the above function with more comments and I moved the bush_root_tab handling into one place JOIN_TAB *next_linear_tab(JOIN* join, JOIN_TAB* tab, bool include_bush_roots) { if (include_bush_roots && tab->bush_children) { /* This JOIN_TAB is a SJM nest; Start from first table in nest */ return tab->bush_children->start; } DBUG_ASSERT(!tab->last_leaf_in_bush || tab->bush_root_tab); if (tab->bush_root_tab) /* Are we inside an SJM nest */ { /* Inside SJM nest */ if (!tab->last_leaf_in_bush) return tab+1; /* Return next in nest */ /* Continue from the sjm on the top level */ tab= tab->bush_root_tab; } /* If no more JOIN_TAB's on the top level */ if (++tab == join->join_tab + join->top_jtrange_tables) return NULL; if (!include_bush_roots && tab->bush_children) { /* This JOIN_TAB is a SJM nest; Start from first table in nest */ tab= tab->bush_children->start; } return tab; } Conclusion from IRC: <spetrunia> if the caller intends to enumerate bush roots, too, then he starts from the first join_tab (that is an sjm) <spetrunia> if the caller intends to skip bush roots, then its his responsibility to start from non-sjm <spetrunia> Hm.. is suppose we could isolate the above logic in first_linear_tab() function <spetrunia> because right now it's all over the place <montywi> yes, that is confusing me a bit <montywi> it would be good to always start with 'first linear_tab()' and then do next_linear_tab() <montywi> and first_linear tab would also have the same include_bush_roots argument <montywi> wouldn't that solve it? <spetrunia> yes Please change then the logic as above.
+ + +/* + A helper function to iterate over all join tables in bush-children-first order + + DESCRIPTION + + For example, for this join plan + + ot1 ot2 sjm ot3 + | +--------+ + | | + it1 it2 it3 + + + the function will return + + ot1-ot2-it1-it2-it3-sjm-ot3 ... + +*/
Good comment. More of these!
+JOIN_TAB *next_depth_first_tab(JOIN* join, JOIN_TAB* tab) +{ + bool start= FALSE; + if (tab == NULL) + { + /* This means we're starting the enumeration */ + if (join->const_tables == join->top_jtrange_tables) + return NULL; + + tab= join->join_tab + join->const_tables; + start= TRUE; + } + + if (tab->last_leaf_in_bush) + return tab->bush_root_tab; + + /* Move to next tab in the array we're traversing*/ + if (!start) + tab++; + + if (tab == join->join_tab_ranges.head()->end) + return NULL; /* End */
Replace with: if (tab == join->join_tab + join->top_jtrange_tables) Because this is simpler and similar to the earlier test in same function
+ + if (tab->bush_children) + return tab->bush_children->start; + + return tab; +}
I think it's better to have a separate function for the first case as it makes things faster and easier to understand: JOIN_TAB *start_depth_first_tab(JOIN *join) { JOIN_TAB* tab; if (join->const_tables == join->top_jtrange_tables) return NULL; tab= join->join_tab + join->const_tables; return (tab->bush_children) ? tab->bush_children->start : tab; }
@@ -6349,7 +6511,7 @@ static bool get_best_combination(JOIN *join) { - uint i,tablenr; + uint tablenr; table_map used_tables; JOIN_TAB *join_tab,*j; KEYUSE *keyuse;
@@ -6368,10 +6530,74 @@
fix_semijoin_strategies_for_picked_join_order(join);
+ JOIN_TAB_RANGE *root_range= new JOIN_TAB_RANGE;
Please move the code for replacing join to another function to keep get_best_combination() small. Check for out of memory condition here.
+ root_range->start= join->join_tab; + /* root_range->end will be set later */ + join->join_tab_ranges.empty(); + join->join_tab_ranges.push_back(root_range);
Check for out of memory condition here.
+ + JOIN_TAB *sjm_nest_end= NULL; + JOIN_TAB *sjm_saved_tab; /* protected by sjm_nest_end */ + LINT_INIT(sjm_saved_tab); + for (j=join_tab, tablenr=0 ; tablenr < table_count ; tablenr++,j++) { TABLE *form; + POSITION *cur_pos= &join->best_positions[tablenr]; + if (cur_pos->sj_strategy == SJ_OPT_MATERIALIZE || + cur_pos->sj_strategy == SJ_OPT_MATERIALIZE_SCAN) + {
Please move this to another function to keep get_best_combination() short.
+ /* + Ok, we've entered an SJ-Materialization semi-join (note that this can't + be done recursively, semi-joins are not allowed to be nested). + */ + /* + 1. Put into main join order a JOIN_TAB that represents a lookup or scan + in the temptable. + // TODO: record this join_tab to be processed by + // setup_semijoin_elimination? + */ + bzero(j, sizeof(JOIN_TAB)); + j->join= join; + j->table= NULL; //temporary way to tell SJM tables from others. + j->ref.key = -1; + j->ref.key_parts=0; + j->loosescan_match_tab= NULL; //non-nulls will be set later + j->use_join_cache= FALSE; + j->on_expr_ref= (Item**) &null_ptr; + j->cache= NULL; + j->keys= key_map(1); // The unique index is always in 'possible keys' in EXPLAIN
Above remove all setting of things to 0 or NULL as bzero does this for you!
+ /* + 2. Proceed with processing SJM nest's join tabs, putting them into the + sub-order + */ + SJ_MATERIALIZATION_INFO *sjm= cur_pos->table->emb_sj_nest->sj_mat_info; + j->records= j->records_read= (ha_rows)(sjm->is_sj_scan? sjm->rows : 1); + JOIN_TAB *jt= (JOIN_TAB*)join->thd->alloc(sizeof(JOIN_TAB) * sjm->tables);
Test for out of memory!
+ JOIN_TAB_RANGE *jt_range= new JOIN_TAB_RANGE;
Test for out of memory!
+ jt_range->start= jt; + jt_range->end= jt + sjm->tables; + //sjm->jt_range= jt_range;
Remove comment
+ join->join_tab_ranges.push_back(jt_range);
Test for out of memory!
+ j->bush_children= jt_range; + j->bush_root_tab= NULL; //note: a lot of code depends on bush nodes not containing one another + j->quick= NULL;
Remove setting of the above NULL's
+ sjm_nest_end= jt + sjm->tables; + sjm_saved_tab= j; + root_range->end= j+1;
Why set root_range->end here (and below) ?
+ + j= jt; + //goto loop_end_not_table;
Remove comment.
+ } + *j= *join->best_positions[tablenr].table; + + if (sjm_nest_end) + j->bush_root_tab= sjm_saved_tab;
You can do this simpler by having sjm_saved_tab set to 0 at start and reset to 0 when sjm_nest_end is set. This allows you to simple do: j->bush_root_tab= sjm_saved_tab;
+ else + root_range->end= j+1;
Why set root_range->end here (and above) ? Isn't it better to set this once end of loop ? Becasue now you set it for every found sjm...
form=join->table[tablenr]=j->table; used_tables|= form->map; form->reginfo.join_tab=j; @@ -6379,14 +6605,14 @@ form->reginfo.not_exists_optimize=0; // Only with LEFT JOIN DBUG_PRINT("info",("type: %d", j->type)); if (j->type == JT_CONST) - continue; // Handled in make_join_stat.. + goto loop_end; // Handled in make_join_stat..
j->loosescan_match_tab= NULL; //non-nulls will be set later j->ref.key = -1; j->ref.key_parts=0;
if (j->type == JT_SYSTEM) - continue; + goto loop_end; if ( !(keyuse= join->best_positions[tablenr].key) || (join->best_positions[tablenr].sj_strategy == SJ_OPT_LOOSE_SCAN)) { @@ -6397,10 +6623,24 @@ } else if (create_ref_for_key(join, j, keyuse, used_tables)) DBUG_RETURN(TRUE); // Something went wrong
+ loop_end: + j->records_read= (ha_rows)join->best_positions[tablenr].records_read;
Why the above addition? (Was not part of the original code) Add at least a comment about this.
+ join->map2table[j->table->tablenr]= j; + + // If we've reached the end of sjm nest, switch back to main sequence + if (j + 1 == sjm_nest_end) + { + j->last_leaf_in_bush= TRUE; + j= sjm_saved_tab; + sjm_nest_end= NULL;
If you use my suggestion about simpliying setting of bush_root_tab you should set sjm_save_tab to NULL here.
+ } }
+ //for (i=0 ; i < table_count ; i++) + // join->map2table[join->join_tab[i].table->tablenr]=join->join_tab+i;
Remove comment
update_depend_map(join); DBUG_RETURN(0); } @@ -6741,6 +6981,8 @@
join_tab= parent->join_tab_reexec; table= &parent->table_reexec[0]; parent->table_reexec[0]= temp_table; + top_jtrange_tables= 1; + tables= 1;
You can do: top_jtrange_tables= table = 1; /* We have only one table and one join_tab*/
@@ -6786,6 +7028,9 @@ join_tab->loosescan_match_tab= NULL; join_tab->emb_sj_nest= NULL; join_tab->pre_idx_push_select_cond= NULL; + join_tab->bush_root_tab= NULL; + join_tab->bush_children= NULL; + join_tab->last_leaf_in_bush= FALSE;
To make things safe, change earlier: !(parent->join_tab_reexec= (JOIN_TAB*) thd->alloc(sizeof(JOIN_TAB)))) to !(parent->join_tab_reexec= (JOIN_TAB*) thd->calloc(sizeof(JOIN_TAB)))) To guarantee that all elements are 0
@@ -6996,58 +7243,72 @@ make_outerjoin_info(JOIN *join) { DBUG_ENTER("make_outerjoin_info"); - for (uint i=join->const_tables ; i < join->tables ; i++) - { - JOIN_TAB *tab=join->join_tab+i; - TABLE *table=tab->table; - TABLE_LIST *tbl= table->pos_in_table_list; - TABLE_LIST *embedding= tbl->embedding; + bool top= TRUE; + List_iterator<JOIN_TAB_RANGE> it(join->join_tab_ranges); + JOIN_TAB_RANGE *jt_range;
- if (tbl->outer_join) + while ((jt_range= it++)) + {
Please see if you can fix this to use first_linear_tab() ... next_linear_tab(join, tab, FALSE)) to not get another indentation level (and simpler code)
+ for (JOIN_TAB *tab=jt_range->start + (top ? join->const_tables : 0); + tab != jt_range->end; tab++) { + TABLE *table=tab->table; /* + psergey: The following is probably incorrect, fix it when we get + semi+outer joins processing to work: */
When is that?
+ if (!table) continue; + TABLE_LIST *tbl= table->pos_in_table_list; + TABLE_LIST *embedding= tbl->embedding; + + if (tbl->outer_join) { /* + Table tab is the only one inner table for outer join. + (Like table t4 for the table reference t3 LEFT JOIN t4 ON t3.a=t4.a + is in the query above.) + */ + tab->last_inner= tab->first_inner= tab; + tab->on_expr_ref= &tbl->on_expr; tab->cond_equal= tbl->cond_equal; - if (embedding->embedding) - tab->first_upper= embedding->embedding->nested_join->first_nested;
Why remove the above? (I can see that could be wrong as we don't know if embedding exists, and it should at least be inside the if below, but why remove it?)
+ JOIN_TAB *first_inner_tab= tab->first_inner; + + if (tab->table) + current_map= tab->table->map; + else + current_map= tab->bush_children->start->emb_sj_nest->sj_inner_tables; +
Please rename 'sj_inner_tables' to sj_inner_table_map' to make the above a bit easier to read. Also, as we spoke on phone that it would be good that after the WL is done, you add some documentation also of the emb_sj_nest into the comment of opt_subselect.c. It would be good to have the whole tab->bush_children structure documented in one place and how and why one should use JOIN_TAB to connect to TABLE_LIST. (The reason is that a goal should be that after we have created JOIN_TAB we should never have to again refer to TABLE_LIST objects and to reach this goal we need to understand how things works now)
bool use_quick_range=0; COND *tmp;
if (tab->type != JT_ALL)
Regards, Monty