developers
Threads by month
- ----- 2025 -----
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- 5 participants
- 6823 discussions
Hi Timour,
This is the first portion of the feedback.
A general note about EXPLAIN output: "Subqueries: 2" looks ugly. A human will
read this as " there are two subqueries somewhere", which is clearly not the
case. All other lists (indexes for index merge, ref components) are
comma-separated, while list of subquery ids uses spaces.
As far as I understand, @@optimizer_switch flags and other controls for dynamic
condition are present, but the feature is not working? We can't push it in
this state, because this will be very confusing for anybody trying to use the
new features.
Further comments below marked with 'psergey:'
> === modified file 'sql/item.cc'
> --- sql/item.cc 2013-08-20 11:48:29 +0000
> +++ sql/item.cc 2013-09-27 08:25:14 +0000
> @@ -457,7 +457,8 @@ int Item::save_str_value_in_field(Field
> Item::Item():
> is_expensive_cache(-1), rsize(0), name(0), orig_name(0), name_length(0),
> fixed(0), is_autogenerated_name(TRUE),
> - collation(&my_charset_bin, DERIVATION_COERCIBLE)
> + collation(&my_charset_bin, DERIVATION_COERCIBLE)/*TODO,
> + pushdown_tables(0)*/
> {
> marker= 0;
> maybe_null=null_value=with_sum_func=with_field=unsigned_flag=0;
> @@ -466,7 +467,7 @@ int Item::save_str_value_in_field(Field
> with_subselect= 0;
> cmp_context= IMPOSSIBLE_RESULT;
> /* Initially this item is not attached to any JOIN_TAB. */
> - join_tab_idx= MAX_TABLES;
> + join_tab_idx= -1;
>
> /* Put item in free list so that we can free all items at end */
> THD *thd= current_thd;
> @@ -515,7 +516,8 @@ int Item::save_str_value_in_field(Field
> is_autogenerated_name(item->is_autogenerated_name),
> with_subselect(item->has_subquery()),
> collation(item->collation),
> - cmp_context(item->cmp_context)
> + cmp_context(item->cmp_context)/*TODO,
> + pushdown_tables(0)*/
> {
> next= thd->free_list; // Put in free list
> thd->free_list= this;
> @@ -583,7 +585,7 @@ void Item::cleanup()
> DBUG_PRINT("enter", ("this: %p", this));
> fixed=0;
> marker= 0;
> - join_tab_idx= MAX_TABLES;
> + join_tab_idx= -1;
> if (orig_name)
> name= orig_name;
> DBUG_VOID_RETURN;
> @@ -605,6 +607,19 @@ bool Item::cleanup_processor(uchar *arg)
>
>
> /**
> + Add the cost of an item to the cost of previous items in the
> + containing expression tree.
> +
> + @param arg the cost of the previously visited items in the exression tree
> +*/
> +
> +bool Item::sum_cost_processor(uchar *arg)
> +{
> + *((double*)arg)= *((double*)arg) + get_cost();
> + return false;
> +}
> +
> +/**
> rename item (used for views, cleanup() return original name).
>
> @param new_name new name of item;
> @@ -8719,6 +8734,17 @@ void Item_cache::set_null()
> }
>
>
> +void Item_cache::update_used_tables()
> +{
> + example->update_used_tables();
> + if (example->const_item())
> + {
> + used_table_map= 0;
> + maybe_null= example->maybe_null;
> + }
> +}
> +
> +
> bool Item_cache_int::cache_value()
> {
> if (!example)
> @@ -9649,7 +9675,7 @@ table_map Item_ref_null_helper::used_tab
> #ifndef DBUG_OFF
>
> /* Debugger help function */
> -static char dbug_item_print_buf[256];
> +static char dbug_item_print_buf[2048];
>
> const char *dbug_print_item(Item *item)
> {
>
> === modified file 'sql/item.h'
> --- sql/item.h 2013-07-17 19:24:29 +0000
> +++ sql/item.h 2013-09-27 13:22:36 +0000
> @@ -578,10 +578,12 @@ class Item {
> make_cond_for_table procedure. During query execution, this item is
> evaluated when the join loop reaches the corresponding JOIN_TAB.
>
> - If the value of join_tab_idx >= MAX_TABLES, this means that there is no
> - corresponding JOIN_TAB.
> + If join_tab_idx < 0, this means that there is no corresponding JOIN_TAB.
> +
> + cur_join_tab_idx is used during join optimization to determine the best
> + position where to push an expression.
> */
> - uint join_tab_idx;
> + int join_tab_idx;
>
> public:
> static void *operator new(size_t size) throw ()
> @@ -658,6 +660,9 @@ class Item {
> subselect */
> DTCollation collation;
> Item_result cmp_context; /* Comparison context */
> + /* tables in the QEP where this predicate should be pushed to */
> + //table_map pushdown_tables;
> +
psergey: It seems, you've introduced the above member, but then decided not to
use it? Please remove all leftover comments, then.
> // alloc & destruct is done as start of select using sql_alloc
> Item();
> /*
> @@ -1006,6 +1011,7 @@ class Item {
> have to be updated in update_used_tables()
> */
> virtual table_map not_null_tables() const { return used_tables(); }
> +
> /*
> Returns true if this is a simple constant item like an integer, not
> a constant expression. Used in the optimizer to propagate basic constants.
> @@ -1172,12 +1178,14 @@ class Item {
> virtual bool remove_fixed(uchar * arg) { fixed= 0; return 0; }
> virtual bool cleanup_processor(uchar *arg);
> virtual bool collect_item_field_processor(uchar * arg) { return 0; }
> + virtual bool collect_item_subselect_processor(uchar * arg) { return 0; }
> virtual bool add_field_to_set_processor(uchar * arg) { return 0; }
> virtual bool find_item_in_field_list_processor(uchar *arg) { return 0; }
> virtual bool find_item_processor(uchar *arg);
> virtual bool change_context_processor(uchar *context) { return 0; }
> virtual bool reset_query_id_processor(uchar *query_id_arg) { return 0; }
> virtual bool is_expensive_processor(uchar *arg) { return 0; }
> + virtual bool sum_cost_processor(uchar *arg);
> virtual bool register_field_in_read_map(uchar *arg) { return 0; }
> virtual bool register_field_in_write_map(uchar *arg) { return 0; }
> virtual bool enumerate_field_refs_processor(uchar *arg) { return 0; }
> @@ -1196,6 +1204,22 @@ class Item {
> virtual bool find_selective_predicates_list_processor(uchar *opt_arg)
> { return 0; }
>
> + /**
> + Set the join tab index to the minimal (left-most) JOIN_TAB to which this
> + Item is attached. The number is an index in depth_first_tab() traversal
> + order.
> + */
> + virtual bool set_join_tab_idx_processor(uchar *join_tab_idx_arg)
> + {
> + /*
> + Items can participate in several pushdown conditions attached to
> + different join tabs. The join tab index property identifies the
> + left-most join tab in the join plan.
> + */
> + set_join_tab_idx(*((int*) join_tab_idx_arg));
> + return false;
> + }
> +
> /* To call bool function for all arguments */
> struct bool_func_call_args
> {
> @@ -1450,6 +1474,32 @@ class Item {
> is_expensive_cache= walk(&Item::is_expensive_processor, 0, (uchar*)0);
> return test(is_expensive_cache);
> }
> + /**
> + The default cost to evaluate a 'cheap' predicate.
> + TODO:
> + Notice that this also adds the cost of accessing Fields (via Item_field).
> + Perhaps this cost should be smaller or 0.
> + */
> + virtual double get_cost() { return (1 / (double)TIME_FOR_COMPARE); }
> + /* The default execution-time cost of an Item is its compile-time cost. */
> + virtual double exec_cost() { return get_cost(); }
psergey: I'm reading the above and I don't understand get_cost() and
exec_cost().
> +
> +
> + /**
> + Return true if this Item must be wrapped in a dynamic condition.
> + */
> + bool needs_dynamic_cond()
> + {
> + /*
> + @todo Do not check if a subquery is expensive, because even subqueries
> + that the optimizer reports to be cheap, are actually quite expensive to
> + re-evaluate multiple times. For instance the scalar SQ of DBT3-Q20
> + reports only ~7 examined rows, but still it is very beneficial to move
> + it to later JOIN_TAB in the query plan.
> + */
> + return (with_subselect && !const_item() /* && is_expensive() */);
> + }
> +
> virtual Field::geometry_type get_geometry_type() const
> { return Field::GEOM_GEOMETRY; };
> String *check_well_formed_result(String *str, bool send_error= 0);
> @@ -1488,12 +1538,12 @@ class Item {
> Item is attached. The number is an index is depth_first_tab() traversal
> order.
> */
> - virtual void set_join_tab_idx(uint join_tab_idx_arg)
> + virtual void set_join_tab_idx(int join_tab_idx_arg)
> {
> - if (join_tab_idx_arg < join_tab_idx)
> + if (join_tab_idx < 0 || join_tab_idx_arg < join_tab_idx)
> join_tab_idx= join_tab_idx_arg;
> }
> - virtual uint get_join_tab_idx() { return join_tab_idx; }
> + virtual int get_join_tab_idx() { return join_tab_idx; }
>
> table_map view_used_tables(TABLE_LIST *view)
> {
> @@ -4065,6 +4115,7 @@ class Item_cache: public Item_basic_cons
> }
>
> void set_used_tables(table_map map) { used_table_map= map; }
> + void update_used_tables();
>
> virtual bool allocate(uint i) { return 0; }
> virtual bool setup(Item *item)
> @@ -4443,4 +4494,8 @@ class Item_iterator_row: public Item_ite
> void close() {}
> };
>
> +#ifndef DBUG_OFF
> +const char *dbug_print_item(Item *item);
> +#endif /*DBUG_OFF*/
> +
> #endif /* SQL_ITEM_INCLUDED */
>
> === modified file 'sql/item_cmpfunc.cc'
> --- sql/item_cmpfunc.cc 2013-08-20 11:48:29 +0000
> +++ sql/item_cmpfunc.cc 2013-09-27 08:25:14 +0000
> @@ -461,7 +461,17 @@ static bool convert_const_to_int(THD *th
> field_item->field_type() != MYSQL_TYPE_YEAR)
> return 1;
>
> - if ((*item)->const_item() && !(*item)->is_expensive())
> + /*
> + An item can be converted to an int if:
> + - is constant, and
> + - it is not a subquery (or contains a subquery), and
> + - it is cheap to compute.
> + The reason for the second requirement is to prevent subquery
> + optimization/execution during the JOIN::prepare phase, because this
> + function is called during the prepare phase as well.
> + */
> + if ((*item)->const_item() &&
> + !(*item)->with_subselect && !(*item)->is_expensive())
> {
> TABLE *table= field->table;
> ulonglong orig_sql_mode= thd->variables.sql_mode;
> @@ -1478,7 +1488,8 @@ bool Item_in_optimizer::fix_left(THD *th
> eval_not_null_tables(NULL);
> with_sum_func= args[0]->with_sum_func;
> with_field= args[0]->with_field;
> - if ((const_item_cache= args[0]->const_item()))
> + if (!(args[0]->with_subselect || args[0]->is_expensive()) &&
> + (const_item_cache= args[0]->const_item()))
> {
> cache->store(args[0]);
> cache->cache_value();
> @@ -4685,6 +4696,17 @@ void Item_cond::neg_arguments(THD *thd)
> }
>
>
> +double Item_cond::exec_cost()
> +{
> + List_iterator_fast<Item> li(list);
> + Item *item;
> + double cost= 0;
> + while ((item= li++))
> + cost+= item->exec_cost();
> + return cost;
> +}
> +
> +
> void Item_cond_and::mark_as_condition_AND_part(TABLE_LIST *embedding)
> {
> List_iterator<Item> li(list);
> @@ -6386,3 +6408,295 @@ longlong Item_func_dyncol_exists::val_in
> null_value= TRUE;
> return 0;
> }
> +
> +
> +double Item_func_collect_stats::exec_cost()
> +{
> + return args[0]->exec_cost();
> +}
> +
> +
> +double Item_func_collect_stats::exec_selectivity()
> +{
> + return count_evals ? count_true / count_evals : 0;
> +}
> +
> +
> +#ifndef DBUG_OFF
> +void Item_func_collect_stats::dbug_print_stats()
> +{
> + char buff[1024];
> + String str(buff,(uint32) sizeof(buff), system_charset_info);
> + str.length(0);
> + str.extra_allocation(1024);
> + print(&str, QT_ORDINARY);
> +
> + (void) fprintf(DBUG_FILE, " sel: %2.2f, cond_evals: %f, cond_true: %f (",
> + exec_selectivity(), count_evals, count_true);
> + (void) fputs(str.c_ptr_safe(),DBUG_FILE);
> + (void) fprintf(DBUG_FILE, ")\n");
> +
> + if (is_cond_and(args[0]))
> + {
> + List_iterator<Item> and_parts_it(*((Item_cond*) args[0])->argument_list());
> + Item *and_part;
> + (void) fprintf(DBUG_FILE, " Individual conjunct selectivity: {\n");
> + while ((and_part= and_parts_it++))
> + {
> + if (and_part->type() == Item::FUNC_ITEM &&
> + (((Item_func*)and_part)->functype() == Item_func::DYNAMIC_COND_FUNC ||
> + ((Item_func*)and_part)->functype() == Item_func::EXEC_STATS_FUNC))
> + {
> + Item_func_collect_stats *stat_cond= (Item_func_collect_stats*) and_part;
> + stat_cond->dbug_print_stats();
> + }
> + }
> + (void) fprintf(DBUG_FILE, "}\n");
> + }
> +}
> +#endif /*DBUG_OFF*/
> +
> +
> +Item_func_dynamic_cond::Item_func_dynamic_cond(Item *cond, JOIN_TAB *atab, JOIN_TAB **ctab)
> + : Item_func_collect_stats(cond)
> +{
> + JOIN *join= atab->join;
> + JOIN_TAB *cur_tab;
> + THD *thd= atab->table->in_use;
> +
> + active_tab= atab;
> + exec_tab= ctab;
> + max_evals= thd->variables.dynamic_pushdown_max_evals;
> + cost_ratio= (double) thd->variables.dynamic_pushdown_cost_ratio / 100.0;
> + sampling_probability= (double) thd->variables.dynamic_pushdown_sampling_probability / 100.0;
> + max_selectivity= (double) thd->variables.dynamic_pushdown_max_selectivity / 100.0;
> +
> + /* Collect the set of all subqueries referenced inside 'cond'. */
> + cond->walk(&Item::collect_item_subselect_processor, 0,
> + (uchar*) &subqueries);
> +
> + /* Find the boundaries between which the predicate can be moved. */
> + if (active_tab->bush_root_tab)
> + {
> + first_tab= active_tab->bush_root_tab->bush_children->start;
> + last_tab= active_tab->bush_root_tab->bush_children->end;
> + }
> + else
> + {
> + first_tab= join->join_tab + join->const_tables;
> + last_tab= join->join_tab + join->top_join_tab_count;
> + }
> +
> + DBUG_ASSERT(active_tab >= first_tab && active_tab < last_tab);
> +
> + /* Precompute some cost information that is static at execution time. */
> + first_tab->exec_fanout= 1;
> + for (cur_tab= first_tab; cur_tab < last_tab; cur_tab++)
> + rel_access_cost+= cur_tab->read_time;
> + rel_access_cost/= first_tab->records_read;
> +
> + /* Copied from Item_func_rand::seed_random, what are the right args? */
> + my_rnd_init(&rand, (uint32) (0x10001L+55555555L),
> + (uint32) (0x10000001L));
> + last_res= -1;
> +}
> +
> +
> +longlong Item_func_dynamic_cond::val_int()
> +{
> + longlong res= 1;
> + THD *thd= active_tab->join->thd;
> +
> + DBUG_ASSERT(active_tab >= first_tab && active_tab < last_tab);
> + DBUG_ASSERT(!(*exec_tab) || (active_tab->join == (*exec_tab)->join));
> + DBUG_ASSERT(!(*exec_tab) || (*exec_tab >= first_tab && *exec_tab < last_tab));
> +
> + if (is_active())
> + {
> + /*
> + If the active tab is the same as the first tab, this method must
> + re-evaluate on each call (that is, not use the cached last_res).
> + */
> + DBUG_ASSERT(*exec_tab > first_tab || last_res == -1);
> + /*
> + If this dynamic condition is attached to the JOIN_TAB currently being
> + executed, evaluate the wrapper condition.
> + */
> + if (last_res == -1)
> + {
> + last_res= Item_func_collect_stats::val_int();
> + /* Choose the best JOIN_TAB where to attach this dynamic condition. */
> + if (optimizer_flag(thd, OPTIMIZER_SWITCH_EXPENSIVE_PRED_DYNAMIC_PUSHDOWN) &&
> + ((ulonglong) count_evals % max_evals == 0))
> + choose_best_join_tab();
> + }
> + res= last_res;
> + last_res= -1;
> + }
> + else if (optimizer_flag(thd, OPTIMIZER_SWITCH_EXPENSIVE_PRED_DYNAMIC_PUSHDOWN) &&
> + (*exec_tab == first_tab) && my_rnd(&rand) <= sampling_probability)
> + {
> + /*
> + If dynamic pushdown is used, evaluate the condition to collect statistics
> + about its selectivity. The result is returned so it can be used to
> + filter join records during sampling.
> + */
> + DBUG_ASSERT(last_res == -1);
> + res= last_res= Item_func_collect_stats::val_int();
> + }
> + return res;
> +}
> +
> +
> +/**
> +*/
> +
> +bool Item_func_dynamic_cond::walk(Item_processor processor, bool walk_subquery,
> + uchar *argument)
> +{
> + return is_active() ?
> + Item_func::walk(processor, walk_subquery, argument) : false;
> +}
> +
> +
> +/**
> + Compute the execution-time selectivity of the condition based on counters.
> +*/
> +
> +double Item_func_dynamic_cond::exec_selectivity()
> +{
> + return is_active() ? Item_func_collect_stats::exec_selectivity() : 1;
> +}
> +
> +
> +/**
> + Compute the total cost of all subqueries wrapped in this dynamic condition.
> +
> + @todo
> + Currently the method returns the compile-time cost. It should instead
> + recompute the cost in the same way as choose_best_join_tab() using execution
> + time statistics.
> +*/
> +
> +double Item_func_dynamic_cond::exec_cost()
> +{
> + if (is_active())
> + {
> + double cost= 0;
> + Item_subselect *subs;
> + List_iterator_fast<Item_subselect> subs_it(subqueries);
> + while ((subs= subs_it++))
> + cost+= subs->get_cost();
> + return cost;
> + }
> + else
> + return 0;
> +}
> +
> +
> +/**
> + Choose the JOIN_TAB in the query plan, where to attach this condition so that
> + that the total plan cost is minimal.
> +*/
> +
> +void Item_func_dynamic_cond::choose_best_join_tab()
> +{
> + JOIN_TAB *cur_tab, *best_tab;
> +
> + /* The current JOI_TAB where this dynamic cond is attached. */
> + JOIN_TAB *active_tab_save= active_tab;
> + /* The JOIN_TAB that is being executed. */
> + JOIN_TAB *exec_tab_save= *exec_tab;
> + double join_fanout;
> + double cond_cost;
> + double cond_sel;
> + double rel_plan_cost;
> + double best_plan_cost= DBL_MAX;
> +
> + /*
> + Update the execution-time approximations of partial join fanouts.
> +
> + TODO: this is very inexact approach, because even if there is a fanout
> + estimate for join_tab J_i, it will be incomparable to the estimate of the
> + previous join tabs if not all rows of J_[i-1] were joined with J_i.
> + */
> + first_tab->exec_fanout= first_tab->exec_partial_join_records ?
> + /* use the execution-time counter */
> + first_tab->exec_partial_join_records :
> + /* use the optimize-time estimate */
> + first_tab->records_read;
> + for (cur_tab= first_tab + 1; cur_tab < last_tab; cur_tab++)
> + {
> + /*
> + If an execution time counter is present, use it, otherwise use the
> + optimize-time estimate.
> + */
> + if (cur_tab->exec_partial_join_records)
> + cur_tab->exec_fanout= cur_tab->exec_partial_join_records /
> + (cur_tab-1)->exec_partial_join_records;
> + else
> + cur_tab->exec_fanout= cur_tab->records_read;
> + }
> +
> + /*
> + Compute the cost of each alternative plan where this dynamic condition is
> + pushed to each possible JOIN_TAB. Currently we only consider moving the
> + condition later in the join plan, that is, to any JOIN_TAB after active_tab.
> +
> + This condition is 'moved' from one JOIN_TAB to the next by looping through
> + all possible 'active_tab' values, starting from the current one.
> + */
> + for (active_tab= active_tab_save; active_tab < last_tab; active_tab++)
> + {
> + rel_plan_cost= 0;
> +
> + /* Recompute the plan cost for the current active_tab. */
> + for (cur_tab= first_tab; cur_tab < last_tab; cur_tab++)
> + {
> + Item_func_collect_stats *pushed_cond;
> + cond_cost= 0;
> + cond_sel= 0;
> + join_fanout= cur_tab->exec_fanout;
> +
> + /*
> + Set the current execution tab. This affects all dynamic conditions of
> + the current JOIN.
> + */
> + *exec_tab= cur_tab;
> +
> + if (cur_tab->select_cond)
> + {
> + /*
> + If a top-level AND condition contains an expensive non-constant
> + subquery, each of the AND terms is wrapped in an item that collectes
> + execution statistics.
> + */
> + DBUG_ASSERT(cur_tab->select_cond->type() == Item::FUNC_ITEM &&
> + (((Item_func*)cur_tab->select_cond)->functype() == Item_func::DYNAMIC_COND_FUNC ||
> + ((Item_func*)cur_tab->select_cond)->functype() == Item_func::EXEC_STATS_FUNC));
> +
> + pushed_cond= (Item_func_collect_stats*) cur_tab->select_cond;
> + cond_cost+= pushed_cond->exec_cost();
> + cond_sel*= pushed_cond->exec_selectivity();
> + }
> +
> + rel_plan_cost+= rel_access_cost + join_fanout * cond_sel * cond_cost;
> + }
> +
> + if (rel_plan_cost / best_plan_cost < cost_ratio)
> + {
> + best_plan_cost= rel_plan_cost;
> + best_tab= active_tab;
> + }
> + }
> +
> + active_tab= best_tab;
> + *exec_tab= exec_tab_save;
> +}
> +
> +
> +void Item_func_dynamic_cond::print(String *str, enum_query_type query_type)
> +{
> + if (is_active())
> + Item_bool_func::print(str, query_type);
> +}
>
> === modified file 'sql/item_cmpfunc.h'
> --- sql/item_cmpfunc.h 2013-07-17 19:24:29 +0000
> +++ sql/item_cmpfunc.h 2013-09-27 08:25:14 +0000
> @@ -125,6 +125,12 @@ class Item_bool_func :public Item_int_fu
> bool is_bool_func() { return 1; }
> void fix_length_and_dec() { decimals=0; max_length=1; }
> uint decimal_precision() const { return 1; }
> + /*
> + The execution logic must guarantee that these methods are called only
> + for classes that provide real implementations. In this way we can use
> + polymorphism without implementing the methods for all subclasses.
> + */
> + virtual double exec_selectivity() { DBUG_ASSERT(FALSE); return 0; }
> };
>
>
> @@ -258,13 +264,12 @@ class Item_in_optimizer: public Item_boo
> void cleanup();
> const char *func_name() const { return "<in_optimizer>"; }
> Item_cache **get_cache() { return &cache; }
> + Item *left_expr() { return args[0]; }
> void keep_top_level_cache();
> Item *transform(Item_transformer transformer, uchar *arg);
> virtual Item *expr_cache_insert_transformer(uchar *thd_arg);
> bool is_expensive_processor(uchar *arg);
> bool is_expensive();
> - void set_join_tab_idx(uint join_tab_idx_arg)
> - { args[1]->set_join_tab_idx(join_tab_idx_arg); }
> virtual void get_cache_parameters(List<Item> ¶meters);
> bool is_top_level_item();
> bool eval_not_null_tables(uchar *opt_arg);
> @@ -492,8 +497,139 @@ class Item_func_trig_cond: public Item_b
> const char *func_name() const { return "trigcond"; };
> bool const_item() const { return FALSE; }
> bool *get_trig_var() { return trig_var; }
> + void update_used_tables()
> + {
> + Item_bool_func::update_used_tables();
> + const_item_cache= false;
> + used_tables_cache|= OUTER_REF_TABLE_BIT;
> + }
> +};
> +
> +
> +/**
> + Item wrapper that collects execution-time statistics about its child Item.
> +
> + This class can be used only during execution.
> +*/
> +
> +class Item_func_collect_stats : public Item_bool_func
> +{
> +protected:
> + double count_true; /* The number of times the predicate was true. */
> + double count_evals; /* The number of times the predicate was evaluated. */
> +public:
> + Item_func_collect_stats(Item *cond)
> + : Item_bool_func(cond), count_true(0), count_evals(0)
> + {
> + DBUG_ASSERT(cond && cond->fixed);
> + /*
> + It doesn't make sense to wrap an item that is already capable of
> + collecting execution-time statistitcs.
> + */
> + DBUG_ASSERT(cond->type() != Item::FUNC_ITEM ||
> + (((Item_func*)cond)->functype() != Item_func::EXEC_STATS_FUNC &&
> + ((Item_func*)cond)->functype() != Item_func::DYNAMIC_COND_FUNC));
> + fixed= true;
> + }
> + enum Functype functype() const { return EXEC_STATS_FUNC; };
> + const char *func_name() const { return "exec_stats"; };
> + bool const_item() const { return FALSE; }
> + longlong val_int()
> + {
> + longlong res= args[0]->val_int();
> + ++count_evals;
> + if (res)
> + ++count_true;
> + return res;
> + }
> + virtual double exec_selectivity();
> + virtual double exec_cost();
> +
> +#ifndef DBUG_OFF
> + void dbug_print_stats();
> +#endif
> +};
> +
> +
> +/**
> + A wrapper class that allows to change dynamically (during execution) the
> + JOIN_TAB that a condition is attached to.
> +*/
> +
> +class Item_func_dynamic_cond : public Item_func_collect_stats
> +{
> +protected:
> + List<Item_subselect> subqueries;
> + /* The join_tab where the condition is currently 'pushed'.*/
> + st_join_table *active_tab;
> + /* The join_tab currently being executed. */
> + st_join_table **exec_tab;
> + /* The boundaries of the sub-plan where this predicate can be pushed. */
> + st_join_table *first_tab, *last_tab;
> + /*
> + The relative access cost based on optimizer estimates. This is the cost
> + of all access methods per one record from the first table in the plan.
> + */
> + double rel_access_cost;
> + /*
> + The number of times a dynamic condition must be evaluated in
> + order to trigger dynamic pushdown". Cached copy of system variable
> + dynamic_pushdown_max_evals.
> + */
> + ha_rows max_evals;
> + /*
> + The ratio by which a new placement of the predicate in the plan
> + must improve the plan cost in order to trigger dynamic pushdown.
> + Derived from system variable dynamic_pushdown_cost_ratio.
> + */
> + double cost_ratio;
> + /*
> + The probability with which to evaluate this condition in order to
> + estimate its selectivity.
> + */
> + double sampling_probability;
> + /*
> + If this predicate's selectivity is below this limit, it is considered
> + sufficiently selective to re-filter all records in a partial join.
> + The selectivity itself is estimated by sampling the predicate with
> + 'sampling_probability'.
> + */
> + double max_selectivity;
> + /*
> + Random number generator state.
> + @todo Consider making it static.
> + */
> + struct my_rnd_struct rand;
> + /*
> + The result of the last evaluation, -1 denotes an invalid result which
> + requires a re-evaluation. This member is needed because the same Item
> + may be evaluated more than once for the same partial join record due to:
> + - sampling - then the item is "active" both for the left-most pushdown
> + join_tab, and for the "active" join_tab.
> + - moving the item to a later join_tab.
> + */
> + longlong last_res;
> +
> +protected:
> + bool is_active()
> + {
> + DBUG_ASSERT(*exec_tab);
> + return (active_tab == *exec_tab);
> + }
> +public:
> + Item_func_dynamic_cond(Item *cond, st_join_table *atab, st_join_table **ctab);
> + longlong val_int();
> + enum Functype functype() const { return DYNAMIC_COND_FUNC; };
> + const char *func_name() const { return "dyncond"; };
> + bool walk(Item_processor processor, bool walk_subquery, uchar *argument);
> + virtual double exec_selectivity();
> + virtual double exec_cost();
> + void choose_best_join_tab();
> +
> + virtual void print(String *str, enum_query_type query_type);
> };
>
> +
> class Item_func_not_all :public Item_func_not
> {
> /* allow to check presence of values in max/min optimization */
> @@ -1609,6 +1745,7 @@ class Item_cond :public Item_bool_func
> Item *compile(Item_analyzer analyzer, uchar **arg_p,
> Item_transformer transformer, uchar *arg_t);
> bool eval_not_null_tables(uchar *opt_arg);
> + virtual double exec_cost();
> };
>
> template <template<class> class LI, class T> class Item_equal_iterator;
> @@ -1922,6 +2059,9 @@ class Item_cond_and :public Item_cond
>
> inline bool is_cond_and(Item *item)
> {
> + if (!item)
> + return FALSE;
> +
> if (item->type() != Item::COND_ITEM)
> return FALSE;
>
>
> === modified file 'sql/item_func.h'
> --- sql/item_func.h 2013-08-20 11:48:29 +0000
> +++ sql/item_func.h 2013-09-27 08:25:14 +0000
> @@ -64,7 +64,7 @@ class Item_func :public Item_result_fiel
> SP_STARTPOINT,SP_ENDPOINT,SP_EXTERIORRING,
> SP_POINTN,SP_GEOMETRYN,SP_INTERIORRINGN,
> NOT_FUNC, NOT_ALL_FUNC,
> - NOW_FUNC, TRIG_COND_FUNC,
> + NOW_FUNC, TRIG_COND_FUNC, DYNAMIC_COND_FUNC, EXEC_STATS_FUNC,
> SUSERVAR_FUNC, GUSERVAR_FUNC, COLLATE_FUNC,
> EXTRACT_FUNC, CHAR_TYPECAST_FUNC, FUNC_SP, UDF_FUNC,
> NEG_FUNC, GSYSVAR_FUNC, DYNCOL_FUNC };
>
> === modified file 'sql/item_subselect.cc'
> --- sql/item_subselect.cc 2013-07-17 19:24:29 +0000
> +++ sql/item_subselect.cc 2013-09-23 15:47:28 +0000
> @@ -50,8 +50,8 @@ int check_and_do_in_subquery_rewrites(JO
> Item_subselect::Item_subselect():
> Item_result_field(), value_assigned(0), own_engine(0), thd(0), old_engine(0),
> used_tables_cache(0), have_to_be_excluded(0), const_item_cache(1),
> - inside_first_fix_fields(0), done_first_fix_fields(FALSE),
> - expr_cache(0), forced_const(FALSE), substitution(0), engine(0), eliminated(FALSE),
> + inside_first_fix_fields(0), done_first_fix_fields(FALSE), expr_cache(0),
> + forced_const(FALSE), substitution(0), engine(0), eliminated(FALSE),
> changed(0), is_correlated(FALSE)
> {
> DBUG_ENTER("Item_subselect::Item_subselect");
> @@ -400,6 +400,21 @@ bool Item_subselect::set_fake_select_as_
> }
>
>
> +bool Item_subselect::collect_item_subselect_processor(uchar * arg)
> +{
> + List<Item_subselect> *subs_list= (List<Item_subselect>*) arg;
> + List_iterator<Item_subselect> subs_list_it(*subs_list);
> + Item_subselect *cur_subs;
> + while ((cur_subs= subs_list_it++))
> + {
> + if (cur_subs->eq(this, 1))
> + return false; /* Already in the set. */
> + }
> + subs_list->push_back(this);
> + return false;
> +}
> +
> +
> bool Item_subselect::mark_as_dependent(THD *thd, st_select_lex *select,
> Item *item)
> {
> @@ -551,7 +566,7 @@ void Item_subselect::recalc_used_tables(
> */
> bool Item_subselect::is_expensive()
> {
> - double examined_rows= 0;
> + double examined_rows= 0, cur_examined_rows= 0;
>
> for (SELECT_LEX *sl= unit->first_select(); sl; sl= sl->next_select())
> {
> @@ -560,19 +575,20 @@ bool Item_subselect::is_expensive()
> continue;
>
> /*
> - Subqueries whose result is known after optimization are not expensive.
> - Such subqueries have all tables optimized away, thus have no join plan.
> + The cost of a subquery is known if it was optimized, and it is either
> + executable, or was already executed and cleaned up.
> */
> - if (cur_join->optimized &&
> - (cur_join->zero_result_cause || !cur_join->tables_list))
> - return false;
> + if (cur_join->have_query_plan != JOIN::QEP_EXECUTABLE &&
> + cur_join->have_query_plan != JOIN::QEP_DELETED)
> + return true;
>
> /*
> - If a subquery is not optimized we cannot estimate its cost. A subquery is
> - considered optimized if it has a join plan.
> + If JOIN::optimize determined that the subquery has an empty result, or
> + this is a table-less subquery, then then subquery is cheap.
> */
> - if (!(cur_join->optimized && cur_join->join_tab))
> - return true;
> + if (!cur_join->table_count || !cur_join->tables_list ||
> + cur_join->zero_result_cause)
> + return false;
>
> if (sl->first_inner_unit())
> {
> @@ -583,13 +599,64 @@ bool Item_subselect::is_expensive()
> return true;
> }
>
> - examined_rows+= cur_join->get_examined_rows();
> + if ((cur_examined_rows= cur_join->get_examined_rows_estimate()) < 0)
> + return true;
psergey: This looks very weird. Perhaphs, it is better to add a constant:
const ha_rows NO_EXAMINED_ROWS_ESTIMATE_YET=-1;
> + examined_rows+= cur_examined_rows;
> }
>
> return (examined_rows > thd->variables.expensive_subquery_limit);
> }
>
>
> +/**
> + Compute the cost of a subquery and all its subqueries.
> +
> + @return total subquery cost in terms of JOIN::best_read
> +*/
> +
> +double Item_subselect::get_cost()
> +{
> + double total_cost= 0;
> + for (SELECT_LEX *sl= unit->first_select(); sl; sl= sl->next_select())
> + {
> + JOIN *inner_join= sl->join;
> + total_cost+=
> + inner_join->best_read + /* The cost of this subquery. */
> + /* The cost of the subqueries of the subquery. */
> + inner_join->total_subquery_cost();
> + }
> + return total_cost;
> +}
> +
> +
> +/**
> + Check if the subquery predicate has only one possible execution strategy.
> +
> + @details
> + If there is only one possible execution strategy for a subquery predicate,
> + the optimizer doesn't need to know the number of times the subquery will be
> + executed in order to choose the better strategy. In this case the underlying
> + subquery can be optimized before optimizing the outer query.
> +
> + @retval FALSE more than one strategies are possible
> + @retval TRUE only one strategy is possible
> +*/
> +
> +bool Item_subselect::has_single_strategy() const
> +{
> + if (!is_in_predicate())
> + return true;
> + else
> + {
> + Item_in_subselect *in_subs= (Item_in_subselect*) this;
> + if (!in_subs->test_strategy(SUBS_MATERIALIZATION) ||
> + !in_subs->test_strategy(SUBS_IN_TO_EXISTS))
> + return true;
> + }
> + return false;
> +}
> +
> +
> bool Item_subselect::walk(Item_processor processor, bool walk_subquery,
> uchar *argument)
> {
> @@ -835,6 +902,10 @@ bool Item_in_subselect::exec()
> test_if_item_cache_changed(*left_expr_cache) < 0)
> DBUG_RETURN(FALSE);
>
> + if (first_execution && test_strategy(SUBS_MATERIALIZATION) &&
> + engine->engine_type() == subselect_engine::SINGLE_SELECT_ENGINE)
> + setup_mat_engine();
> +
> /*
> The exec() method below updates item::value, and item::null_value, thus if
> we don't call it, the next call to item::val_int() will return whatever
> @@ -888,6 +959,12 @@ void Item_subselect::update_used_tables(
> if (!(used_tables_cache & ~engine->upper_select_const_tables()))
> const_item_cache= 1;
> }
> + /*
> + TODO: when merged with dynamic pushdown, pushdown_tables should not be
> + added to the used tables. Instead, dynamic pushdown could use this table
> + as the initial dynamic position, or it may ignore this estimate altogether.
> + used_tables_cache|= pushdown_tables;
> + */
> }
> }
>
> @@ -896,6 +973,16 @@ void Item_subselect::print(String *str,
> {
> if (engine)
> {
> + /*
> + For non-SJ materialization the engine is created during exec(), therefore
> + it is not available during EXPLAIN. Create the materialization engine in
> + order to print it.
> + */
> + if (is_in_predicate() &&
> + ((Item_in_subselect*)this)->test_strategy(SUBS_MATERIALIZATION) &&
> + engine->engine_type() == subselect_engine::SINGLE_SELECT_ENGINE)
> + ((Item_in_subselect*)this)->setup_mat_engine();
> +
> str->append('(');
> engine->print(str, query_type);
> str->append(')');
> @@ -1921,7 +2008,7 @@ bool Item_allany_subselect::transform_in
> The swap is needed for expressions of type 'f1 < ALL ( SELECT ....)'
> where we want to evaluate the sub query even if f1 would be null.
> */
> - subs= func->create_swap(*(optimizer->get_cache()), subs);
> + subs= func->create_swap(optimizer->left_expr(), subs);
> thd->change_item_tree(place, subs);
> if (subs->fix_fields(thd, &subs))
> DBUG_RETURN(true);
> @@ -3174,8 +3261,13 @@ void Item_in_subselect::update_used_tabl
> {
> Item_subselect::update_used_tables();
> left_expr->update_used_tables();
> - //used_tables_cache |= left_expr->used_tables();
> used_tables_cache= Item_subselect::used_tables() | left_expr->used_tables();
> + /*
> + Constant optimization of the outer query may transform the left IN operand
> + into a constant. The subquery is no longer uncacheable.
> + */
> + if (!used_tables_cache)
> + engine->make_cacheable();
> }
>
>
> @@ -4128,12 +4220,24 @@ uint8 subselect_single_select_engine::un
> }
>
>
> +void subselect_single_select_engine::make_cacheable()
> +{
> + select_lex->uncacheable= 0;
> +}
> +
> +
> uint8 subselect_union_engine::uncacheable()
> {
> return unit->uncacheable;
> }
>
>
> +void subselect_union_engine::make_cacheable()
> +{
> + unit->uncacheable= 0;
> +}
> +
> +
> void subselect_single_select_engine::exclude()
> {
> select_lex->master_unit()->exclude_level();
> @@ -5045,7 +5149,7 @@ double get_fanout_with_deps(JOIN *join,
> !tab->emb_sj_nest &&
> tab->records_read != 0)
> {
> - fanout *= rows2double(tab->records_read);
> + fanout *= tab->records_read;
> }
> }
> return fanout;
> @@ -5234,7 +5338,7 @@ int subselect_hash_sj_engine::exec()
> materialize_join->exec();
> if ((res= test(materialize_join->error || thd->is_fatal_error ||
> thd->is_error())))
> - goto err;
> + goto end;
>
> /*
> TODO:
> @@ -5259,7 +5363,8 @@ int subselect_hash_sj_engine::exec()
> item_in->reset();
> item_in->make_const();
> item_in->set_first_execution();
> - DBUG_RETURN(FALSE);
> + res= 0;
> + goto end;
> }
>
> /*
> @@ -5302,7 +5407,8 @@ int subselect_hash_sj_engine::exec()
> item_in->null_value= 1;
> item_in->make_const();
> item_in->set_first_execution();
> - DBUG_RETURN(FALSE);
> + res= 0;
> + goto end;
> }
>
> if (has_covering_null_row)
> @@ -5360,7 +5466,7 @@ int subselect_hash_sj_engine::exec()
> {
> /* This is an irrecoverable error. */
> res= 1;
> - goto err;
> + goto end;
> }
> }
> }
> @@ -5369,7 +5475,7 @@ int subselect_hash_sj_engine::exec()
> lookup_engine= pm_engine;
> item_in->change_engine(lookup_engine);
>
> -err:
> +end:
> thd->lex->current_select= save_select;
> DBUG_RETURN(res);
> }
>
> === modified file 'sql/item_subselect.h'
> --- sql/item_subselect.h 2013-03-27 22:41:02 +0000
> +++ sql/item_subselect.h 2013-09-23 15:47:28 +0000
> @@ -133,13 +133,14 @@ class Item_subselect :public Item_result
>
> Item_subselect();
>
> - virtual subs_type substype() { return UNKNOWN_SUBS; }
> - bool is_in_predicate()
> + virtual subs_type substype() const { return UNKNOWN_SUBS; }
> + bool is_in_predicate() const
> {
> return (substype() == Item_subselect::IN_SUBS ||
> substype() == Item_subselect::ALL_SUBS ||
> substype() == Item_subselect::ANY_SUBS);
> }
> + bool has_single_strategy() const;
>
> /*
> We need this method, because some compilers do not allow 'this'
> @@ -182,6 +183,7 @@ class Item_subselect :public Item_result
> void make_const()
> {
> used_tables_cache= 0;
> + //pushdown_tables= 0;
> const_item_cache= 0;
> forced_const= TRUE;
> }
> @@ -234,6 +236,8 @@ class Item_subselect :public Item_result
> @retval FALSE otherwise
> */
> bool is_expensive_processor(uchar *arg) { return is_expensive(); }
> + virtual double get_cost();
> + bool collect_item_subselect_processor(uchar * arg);
>
> /**
> Get the SELECT_LEX structure associated with this Item.
> @@ -261,6 +265,8 @@ class Item_subselect :public Item_result
> friend bool convert_join_subqueries_to_semijoins(JOIN *join);
> };
>
> +typedef bool (Item_subselect::*subselect_filter)() const;
> +
> /* single value subselect */
>
> class Item_cache;
> @@ -274,7 +280,7 @@ class Item_singlerow_subselect :public I
> {}
>
> void cleanup();
> - subs_type substype() { return SINGLEROW_SUBS; }
> + subs_type substype() const { return SINGLEROW_SUBS; }
>
> void reset();
> void no_rows_in_result();
> @@ -370,7 +376,7 @@ class Item_exists_subselect :public Item
> emb_on_expr_nest(NULL), optimizer(0), exists_transformed(0)
> {}
>
> - subs_type substype() { return EXISTS_SUBS; }
> + subs_type substype() const { return EXISTS_SUBS; }
> void reset()
> {
> eliminated= FALSE;
> @@ -577,7 +583,7 @@ class Item_in_subselect :public Item_exi
> pushed_cond_guards(NULL), func(NULL), is_jtbm_merged(FALSE),
> is_jtbm_const_tab(FALSE), upper_item(0) {}
> void cleanup();
> - subs_type substype() { return IN_SUBS; }
> + subs_type substype() const { return IN_SUBS; }
> void reset()
> {
> eliminated= FALSE;
> @@ -717,7 +723,7 @@ class Item_allany_subselect :public Item
>
> void cleanup();
> // only ALL subquery has upper not
> - subs_type substype() { return all?ALL_SUBS:ANY_SUBS; }
> + subs_type substype() const { return all?ALL_SUBS:ANY_SUBS; }
> bool select_transformer(JOIN *join);
> void create_comp_func(bool invert) { func= func_creator(invert); }
> virtual void print(String *str, enum_query_type query_type);
> @@ -788,6 +794,7 @@ class subselect_engine: public Sql_alloc
> virtual int exec()= 0;
> virtual uint cols()= 0; /* return number of columns in select */
> virtual uint8 uncacheable()= 0; /* query is uncacheable */
> + virtual void make_cacheable() {}
> enum Item_result type() { return res_type; }
> enum Item_result cmptype() { return cmp_type; }
> enum_field_types field_type() { return res_field_type; }
> @@ -828,6 +835,7 @@ class subselect_single_select_engine: pu
> int exec();
> uint cols();
> uint8 uncacheable();
> + void make_cacheable();
> void exclude();
> table_map upper_select_const_tables();
> virtual void print (String *str, enum_query_type query_type);
> @@ -862,6 +870,7 @@ class subselect_union_engine: public sub
> int exec();
> uint cols();
> uint8 uncacheable();
> + void make_cacheable();
> void exclude();
> table_map upper_select_const_tables();
> virtual void print (String *str, enum_query_type query_type);
>
> === modified file 'sql/opt_subselect.cc'
> --- sql/opt_subselect.cc 2013-07-17 19:24:29 +0000
> +++ sql/opt_subselect.cc 2013-05-28 12:22:32 +0000
> @@ -3620,7 +3620,10 @@ bool setup_sj_materialization_part2(JOIN
> emb_sj_nest->sj_subq_pred)))
> DBUG_RETURN(TRUE); /* purecov: inspected */
> sjm_tab->type= JT_EQ_REF;
> - sjm_tab->select_cond= sjm->in_equality;
> + remove_sj_conds(&sjm_tab->select_cond);
> + sjm_tab->select_cond= and_items(sjm_tab->select_cond, sjm->in_equality);
> + if (!sjm_tab->select_cond->fixed)
> + sjm_tab->select_cond->fix_fields(thd, &sjm_tab->select_cond);
> }
> else
> {
> @@ -4907,7 +4910,7 @@ static void remove_subq_pushed_predicate
>
>
> /**
> - Optimize all subqueries of a query that were not flattened into a semijoin.
> + Optimize all non-IN subqueries that were not flattened into a semijoin.
>
> @details
> Optimize all immediate children subqueries of a query.
> @@ -4924,7 +4927,12 @@ static void remove_subq_pushed_predicate
>
> bool JOIN::optimize_unflattened_subqueries()
> {
> - return select_lex->optimize_unflattened_subqueries(false);
> + return select_lex->optimize_subqueries(&Item_subselect::has_single_strategy);
> +}
> +
> +bool JOIN::optimize_in_subqueries()
> +{
> + return select_lex->optimize_subqueries(NULL);
> }
>
> /**
> @@ -4958,7 +4966,7 @@ bool JOIN::optimize_constant_subqueries(
> not for EXPLAIN.
> */
> select_lex->options&= ~SELECT_DESCRIBE;
> - res= select_lex->optimize_unflattened_subqueries(true);
> + res= select_lex->optimize_subqueries(&Item_subselect::const_item);
> select_lex->options= save_options;
> return res;
> }
> @@ -5242,9 +5250,13 @@ bool setup_jtbm_semi_joins(JOIN *join, L
> {
> DBUG_ASSERT(subq_pred->test_set_strategy(SUBS_MATERIALIZATION));
> subq_pred->is_jtbm_const_tab= FALSE;
> +
> + DBUG_ASSERT(subq_pred->engine->engine_type() == subselect_engine::SINGLE_SELECT_ENGINE);
> + subq_pred->setup_mat_engine();
> + DBUG_ASSERT(subq_pred->engine->engine_type() == subselect_engine::HASH_SJ_ENGINE);
> subselect_hash_sj_engine *hash_sj_engine=
> - ((subselect_hash_sj_engine*)item->engine);
> -
> + ((subselect_hash_sj_engine*) subq_pred->engine);
> +
> table->table= hash_sj_engine->tmp_table;
> table->table->pos_in_table_list= table;
>
> @@ -5464,27 +5476,6 @@ bool JOIN::choose_subquery_plan(table_ma
> DBUG_PRINT("info",("outer_lookup_keys: %.2f", outer_lookup_keys));
> }
>
> - /*
> - If (1) materialization is a possible strategy based on semantic analysis
> - during the prepare phase, then if
> - (2) it is more expensive than the IN->EXISTS transformation, and
> - (3) it is not possible to create usable indexes for the materialization
> - strategy,
> - fall back to IN->EXISTS.
> - otherwise
> - use materialization.
> - */
> - if (in_subs->test_strategy(SUBS_MATERIALIZATION) &&
> - in_subs->setup_mat_engine())
> - {
> - /*
> - If materialization was the cheaper or the only user-selected strategy,
> - but it is not possible to execute it due to limitations in the
> - implementation, fall back to IN-TO-EXISTS.
> - */
> - in_subs->set_strategy(SUBS_IN_TO_EXISTS);
> - }
> -
> if (in_subs->test_strategy(SUBS_MATERIALIZATION))
> {
> /* Restore the original query plan used for materialization. */
>
> === modified file 'sql/sql_class.h'
> --- sql/sql_class.h 2013-09-13 16:14:56 +0000
> +++ sql/sql_class.h 2013-09-27 08:25:14 +0000
> @@ -508,6 +508,10 @@ typedef struct system_variables
> ulong optimizer_search_depth;
> ulong optimizer_selectivity_sampling_limit;
> ulong optimizer_use_condition_selectivity;
> + ha_rows dynamic_pushdown_max_evals;
> + uint dynamic_pushdown_cost_ratio;
> + uint dynamic_pushdown_sampling_probability;
> + uint dynamic_pushdown_max_selectivity;
> ulong use_stat_tables;
> ulong histogram_size;
> ulong histogram_type;
>
> === modified file 'sql/sql_delete.cc'
> --- sql/sql_delete.cc 2013-08-06 20:33:18 +0000
> +++ sql/sql_delete.cc 2013-04-01 10:36:05 +0000
> @@ -126,7 +126,7 @@ bool mysql_delete(THD *thd, TABLE_LIST *
> }
>
> /* Apply the IN=>EXISTS transformation to all subqueries and optimize them. */
> - if (select_lex->optimize_unflattened_subqueries(false))
> + if (select_lex->optimize_subqueries(NULL))
> DBUG_RETURN(TRUE);
>
> const_cond= (!conds || conds->const_item());
>
> === modified file 'sql/sql_join_cache.cc'
> --- sql/sql_join_cache.cc 2013-05-05 18:39:31 +0000
> +++ sql/sql_join_cache.cc 2013-09-27 08:25:14 +0000
> @@ -2080,6 +2080,7 @@ enum_nested_loop_state JOIN_CACHE::join_
> if (outer_join_first_inner && !join_tab->first_unmatched)
> join_tab->not_null_compl= TRUE;
>
> + *(join->cur_exec_tab)= join_tab;
psergey: Do you still remember why you had to add this? It would be nice to put
in a comment.
> if (!join_tab->first_unmatched)
> {
> /* Find all records from join_tab that match records from join buffer */
>
> === modified file 'sql/sql_lex.cc'
> --- sql/sql_lex.cc 2013-08-18 19:29:06 +0000
> +++ sql/sql_lex.cc 2013-04-01 10:36:05 +0000
> @@ -3461,7 +3461,7 @@ bool st_select_lex::add_index_hint (THD
> @retval TRUE error occurred.
> */
>
> -bool st_select_lex::optimize_unflattened_subqueries(bool const_only)
> +bool st_select_lex::optimize_subqueries(subselect_filter filter)
> {
> for (SELECT_LEX_UNIT *un= first_inner_unit(); un; un= un->next_unit())
> {
> @@ -3476,11 +3476,8 @@ bool st_select_lex::optimize_unflattened
> continue;
> }
>
> - if (const_only && !subquery_predicate->const_item())
> - {
> - /* Skip non-constant subqueries if the caller asked so. */
> + if (filter && !((subquery_predicate->*filter)()))
> continue;
> - }
>
> bool empty_union_result= true;
> bool is_correlated_unit= false;
> @@ -4204,7 +4201,9 @@ int st_select_lex::print_explain(select_
> bool *printed_anything)
> {
> int res;
> - if (join && join->have_query_plan == JOIN::QEP_AVAILABLE)
> + if (join &&
> + (join->have_query_plan == JOIN::QEP_EXPLAINABLE ||
> + join->have_query_plan == JOIN::QEP_EXECUTABLE))
> {
> /*
> There is a number of reasons join can be marked as degenerate, so all
>
> === modified file 'sql/sql_lex.h'
> --- sql/sql_lex.h 2013-07-17 19:24:29 +0000
> +++ sql/sql_lex.h 2013-04-01 10:36:05 +0000
> @@ -1026,7 +1026,7 @@ class st_select_lex: public st_select_le
>
> void clear_index_hints(void) { index_hints= NULL; }
> bool is_part_of_union() { return master_unit()->is_union(); }
> - bool optimize_unflattened_subqueries(bool const_only);
> + bool optimize_subqueries(subselect_filter filter);
> /* Set the EXPLAIN type for this subquery. */
> void set_explain_type(bool on_the_fly);
> bool handle_derived(LEX *lex, uint phases);
>
> === modified file 'sql/sql_priv.h'
> --- sql/sql_priv.h 2013-07-17 19:24:29 +0000
> +++ sql/sql_priv.h 2013-09-27 08:25:14 +0000
> @@ -228,7 +228,8 @@ template <class T> bool valid_buffer_ran
> #define OPTIMIZER_SWITCH_TABLE_ELIMINATION (1ULL << 26)
> #define OPTIMIZER_SWITCH_EXTENDED_KEYS (1ULL << 27)
> #define OPTIMIZER_SWITCH_EXISTS_TO_IN (1ULL << 28)
> -#define OPTIMIZER_SWITCH_USE_CONDITION_SELECTIVITY (1ULL << 29)
> +#define OPTIMIZER_SWITCH_EXPENSIVE_PRED_STATIC_PUSHDOWN (1ULL << 29)
> +#define OPTIMIZER_SWITCH_EXPENSIVE_PRED_DYNAMIC_PUSHDOWN (1ULL << 30)
>
> #define OPTIMIZER_SWITCH_DEFAULT (OPTIMIZER_SWITCH_INDEX_MERGE | \
> OPTIMIZER_SWITCH_INDEX_MERGE_UNION | \
> @@ -250,7 +251,7 @@ template <class T> bool valid_buffer_ran
> OPTIMIZER_SWITCH_SUBQUERY_CACHE | \
> OPTIMIZER_SWITCH_SEMIJOIN | \
> OPTIMIZER_SWITCH_FIRSTMATCH | \
> - OPTIMIZER_SWITCH_LOOSE_SCAN )
> + OPTIMIZER_SWITCH_LOOSE_SCAN)
> /*
> Replication uses 8 bytes to store SQL_MODE in the binary log. The day you
> use strictly more than 64 bits by adding one more define above, you should
>
> === modified file 'sql/sql_select.cc'
> --- sql/sql_select.cc 2013-08-20 11:48:29 +0000
> +++ sql/sql_select.cc 2013-09-27 13:22:36 +0000
> @@ -200,14 +200,12 @@ int join_read_always_key_or_null(JOIN_TA
> int join_read_next_same_or_null(READ_RECORD *info);
> static COND *make_cond_for_table(THD *thd, Item *cond,table_map table,
> table_map used_table,
> - int join_tab_idx_arg,
> bool exclude_expensive_cond,
> bool retain_ref_cond);
> static COND *make_cond_for_table_from_pred(THD *thd, Item *root_cond,
> Item *cond,
> table_map tables,
> table_map used_table,
> - int join_tab_idx_arg,
> bool exclude_expensive_cond,
> bool retain_ref_cond);
>
> @@ -334,6 +332,50 @@ bool dbug_user_var_equals_int(THD *thd,
> }
> return FALSE;
> }
> +
> +
> +/**
> + Print query execution statistics.
> +*/
> +
> +void print_execution_stats(JOIN *join)
> +{
> + JOIN_TAB *first_tab, *tab;
> +
> + DBUG_LOCK_FILE;
> +
> + fprintf(DBUG_FILE, "\nexecution_statistics {\n");
> +
> + first_tab= first_breadth_first_tab(join, WALK_OPTIMIZATION_TABS);
> + for (tab= first_tab; tab;
> + tab= next_breadth_first_tab(join, WALK_OPTIMIZATION_TABS, tab))
> + {
> + if (tab->bush_root_tab)
> + continue; /* TODO: how to process nested tabs?*/
> + fprintf(DBUG_FILE, "%s: actual_card: %f, est_card: %g, est_sel: %f\n",
> + tab->table->alias.c_ptr(),
> + tab->exec_partial_join_records,
> + tab->partial_join_cardinality,
> + tab->cond_selectivity);
> +
> + if (!tab->select_cond)
> + continue;
> +
> + fprintf(DBUG_FILE, " dynamic_cond_stats:\n");
> + if (tab->select_cond->type() == Item::FUNC_ITEM &&
> + (((Item_func*)tab->select_cond)->functype() == Item_func::DYNAMIC_COND_FUNC ||
> + ((Item_func*)tab->select_cond)->functype() == Item_func::EXEC_STATS_FUNC))
> + {
> + Item_func_collect_stats *stat_cond= (Item_func_collect_stats*) tab->select_cond;
> + stat_cond->dbug_print_stats();
> + }
> + }
> + fprintf(DBUG_FILE, "}\n");
> + fprintf(DBUG_FILE, "\n");
> +
> + DBUG_UNLOCK_FILE;
> +}
> +
> #endif
>
>
> @@ -1013,9 +1055,16 @@ int JOIN::optimize()
> short-circuit because optimized==TRUE.
> */
> if (!res && have_query_plan != QEP_DELETED)
> - have_query_plan= QEP_AVAILABLE;
> + {
> + if (have_query_plan == QEP_INCOMPLETE)
> + have_query_plan= QEP_EXPLAINABLE;
> + else
> + have_query_plan= QEP_EXECUTABLE;
> + }
> return res;
> }
> +
> +
> /**
> global select optimisation.
>
> @@ -1046,6 +1095,7 @@ JOIN::optimize_inner()
>
> set_allowed_join_cache_types();
> need_distinct= TRUE;
> + examined_rows_estimate= -1;
>
> /* Run optimize phase for all derived tables/views used in this SELECT. */
> if (select_lex->handle_derived(thd->lex, DT_OPTIMIZE))
> @@ -1282,8 +1332,7 @@ TODO: make view to decide if it is possi
> if (conds && !(thd->lex->describe & DESCRIBE_EXTENDED))
> {
> COND *table_independent_conds=
> - make_cond_for_table(thd, conds, PSEUDO_TABLE_BITS, 0, -1,
> - FALSE, FALSE);
> + make_cond_for_table(thd, conds, PSEUDO_TABLE_BITS, 0, FALSE, FALSE);
> DBUG_EXECUTE("where",
> print_where(table_independent_conds,
> "where after opt_sum_query()",
> @@ -1296,6 +1345,11 @@ TODO: make view to decide if it is possi
> {
> DBUG_PRINT("info",("No tables"));
> error= 0;
> + /*
> + Even if there are no tables the query has some small cost. Use the default
> + cost of all 'cheap' items.
> + */
> + best_read= (1 / (double)TIME_FOR_COMPARE);
> goto setup_subq_exit;
> }
> error= -1; // Error is sent to client
> @@ -1730,12 +1784,18 @@ TODO: make view to decide if it is possi
> if (!(select_options & SELECT_DESCRIBE))
> init_ftfuncs(thd, select_lex, test(order));
>
> - if (optimize_unflattened_subqueries())
> + if (optimize_in_subqueries())
> DBUG_RETURN(1);
>
> int res;
> if ((res= rewrite_to_index_subquery_engine(this)) != -1)
> + {
> + /*
> + This subquery has been rewritten to direct index access via handler calls
> + instead of going through JOIN::exec.
> + */
> DBUG_RETURN(res);
> + }
> if (setup_subquery_caches())
> DBUG_RETURN(-1);
>
> @@ -1863,7 +1923,7 @@ TODO: make view to decide if it is possi
> Even with zero matching rows, subqueries in the HAVING clause may
> need to be evaluated if there are aggregate functions in the query.
> */
> - if (optimize_unflattened_subqueries())
> + if (optimize_unflattened_subqueries() || optimize_in_subqueries())
> DBUG_RETURN(1);
> error= 0;
>
> @@ -2015,6 +2075,10 @@ int JOIN::init_execution()
> DBUG_RETURN(-1); /* purecov: inspected */
> }
>
> +
> + if (setup_dynamic_conditions())
> + DBUG_RETURN(1);
> +
> DBUG_RETURN(0);
> }
>
> @@ -2845,8 +2909,7 @@ void JOIN::exec_inner()
> curr_table->table->map);
>
> Item* sort_table_cond= make_cond_for_table(thd, curr_join->tmp_having,
> - used_tables,
> - (table_map)0, -1,
> + used_tables, (table_map)0,
> FALSE, FALSE);
> if (sort_table_cond)
> {
> @@ -2886,7 +2949,7 @@ void JOIN::exec_inner()
> QT_ORDINARY););
> curr_join->tmp_having= make_cond_for_table(thd, curr_join->tmp_having,
> ~ (table_map) 0,
> - ~used_tables, -1,
> + ~used_tables,
> FALSE, FALSE);
> DBUG_EXECUTE("where",print_where(curr_join->tmp_having,
> "having after sort",
> @@ -3331,6 +3394,7 @@ make_join_statistics(JOIN *join, List<TA
> */
> join->best_positions= (POSITION*) join->thd->alloc(sizeof(POSITION)*
> (table_count +1));
> + join->cur_exec_tab= (JOIN_TAB**) join->thd->calloc(sizeof(JOIN_TAB*));
>
> if (join->thd->is_fatal_error)
> DBUG_RETURN(1); // Eom /* purecov: inspected */
> @@ -3884,6 +3948,9 @@ make_join_statistics(JOIN *join, List<TA
>
> if (join->const_tables != join->table_count)
> optimize_keyuse(join, keyuse_array);
> +
> + if (join->optimize_unflattened_subqueries())
> + DBUG_RETURN(1);
>
> if (optimize_semijoin_nests(join, all_table_map))
> DBUG_RETURN(TRUE); /* purecov: inspected */
> @@ -6879,28 +6946,57 @@ void JOIN::get_prefix_cost_and_fanout(ui
>
>
> /**
> + Compute the total cost of all subqueries of a (sub)query.
> +
> + @return the cost of all subqueries
> +*/
> +
> +double JOIN::total_subquery_cost()
psergey: I think this function needs to be called get_total_subquery() cost,
to follow the pattern set by other similar functions.
When I read this function, I don't understand why it doesn't account for the
fact that a subquery might be evaluated multiple times. Please add a comment
about that.
> +{
> + double total_cost= 0;
> + for (SELECT_LEX_UNIT *un= select_lex->first_inner_unit();
> + un; un= un->next_unit())
> + {
> + Item_subselect *subquery_predicate= un->item;
> + if (!subquery_predicate)
> + continue;
> +
> + total_cost+= subquery_predicate->get_cost();
> + }
> + return total_cost;
> +}
> +
> +
> +/**
> Estimate the number of rows that query execution will read.
>
> @todo This is a very pessimistic upper bound. Use join selectivity
> when available to produce a more realistic number.
> */
>
> -double JOIN::get_examined_rows()
> +double JOIN::get_examined_rows_estimate()
> {
> - ha_rows examined_rows;
> - double prev_fanout= 1;
> - JOIN_TAB *tab= first_breadth_first_tab(this, WALK_OPTIMIZATION_TABS);
> - JOIN_TAB *prev_tab= tab;
> + if (examined_rows_estimate >= 0)
> + return examined_rows_estimate;
> + else if (!(have_query_plan == QEP_EXPLAINABLE ||
> + have_query_plan == QEP_EXECUTABLE))
> + return -1;
> + else
> + {
> + double prev_fanout= 1;
> + JOIN_TAB *tab= first_breadth_first_tab(this, WALK_OPTIMIZATION_TABS);
> + JOIN_TAB *prev_tab= tab;
>
> - examined_rows= tab->get_examined_rows();
> + examined_rows_estimate= tab->get_examined_rows_estimate();
>
> - while ((tab= next_breadth_first_tab(this, WALK_OPTIMIZATION_TABS, tab)))
> - {
> - prev_fanout *= prev_tab->records_read;
> - examined_rows+= (ha_rows) (tab->get_examined_rows() * prev_fanout);
> - prev_tab= tab;
> + while ((tab= next_breadth_first_tab(this, WALK_OPTIMIZATION_TABS, tab)))
> + {
> + prev_fanout *= prev_tab->records_read;
> + examined_rows_estimate+= tab->get_examined_rows_estimate() * prev_fanout;
> + prev_tab= tab;
> + }
> + return examined_rows_estimate;
> }
> - return examined_rows;
> }
>
>
> @@ -7374,15 +7470,15 @@ best_extension_by_limited_search(JOIN
> remaining_tables &
> ~real_table_bit);
> join->positions[idx].cond_selectivity= pushdown_cond_selectivity;
> - double partial_join_cardinality= current_record_count *
> - pushdown_cond_selectivity;
> + join->positions[idx].partial_join_cardinality= current_record_count *
> + pushdown_cond_selectivity;
> if ( (search_depth > 1) && (remaining_tables & ~real_table_bit) & allowed_tables )
> { /* Recursively expand the current partial plan */
> swap_variables(JOIN_TAB*, join->best_ref[idx], *pos);
> if (best_extension_by_limited_search(join,
> remaining_tables & ~real_table_bit,
> idx + 1,
> - partial_join_cardinality,
> + join->positions[idx].partial_join_cardinality,
> current_read_time,
> search_depth - 1,
> prune_level,
> @@ -7400,13 +7496,68 @@ best_extension_by_limited_search(JOIN
> join->positions[join->const_tables].table->table)
> /* We have to make a temp table */
> current_read_time+= current_record_count;
> +
> + /*
> + For each subquery find the partial join with the least cardinality
> + in order to minimize the number of subquery evaluations.
> + */
psergey: Please move this new piece of code into a separate function. It does
one thing, its action can be easily separated from what the rest of the code in
this function does, so it should be moved away.
> + if (optimizer_flag(thd, OPTIMIZER_SWITCH_EXPENSIVE_PRED_STATIC_PUSHDOWN) &&
> + join->conds && join->conds->with_subselect
> + /* @todo This is not a reliable estimate:
> + && join->conds->is_expensive()*/)
> + {
> + bool may_benefit_from_static_pushdown= false;
> + /*
> + Find and store the POSITION with the least partial result cardinality
> + between the current and last POSITION in the query plan.
> + */
psergey: Does the loop below do what the comment says? AFAIU, the code sets
"least_card_pos" for every position in the join->positions[] array.
> + POSITION *min_pos= NULL;
> + for (POSITION *p_pos= join->positions + idx,
> + *p_end= join->positions + join->const_tables;
> + p_pos >= p_end ;
> + p_pos--)
> + {
> + if (!min_pos ||
> + p_pos->partial_join_cardinality <= min_pos->partial_join_cardinality)
> + min_pos= p_pos;
> + p_pos->least_card_pos= min_pos;
> + if (p_pos->least_card_pos != p_pos)
> + may_benefit_from_static_pushdown= true;
> + }
> +
> + /*
> + Make expensive predicates dependent on the table that is in the
> + partial join with the least cardinality after the left-most join
> + where the predicate can be pushed.
> + */
psergey: ^ out of date comment?
> + if (may_benefit_from_static_pushdown)
> + {
> + if (is_cond_and(join->conds))
> + {
psergey: I find it weird that every time you need to analyze the WHERE
condition for expensive items. I would have expected that you would
- collect a list of AND-parts of the WHERE that can be moved
- do some pre-processing (like, fetch their cost)
and after that run join optimization that would use this data.
> + List_iterator<Item> and_parts_it(*((Item_cond*) join->conds)->argument_list());
> + Item *and_part;
> + while ((and_part= and_parts_it++))
> + {
> + if (!and_part->with_subselect
> + /* @todo This is not a reliable estimate:
> + !and_part->is_expensive()*/)
psergey: There are a lot of places in the code that check a condition whether
a given Item object should be processed by cost-based condition pushdown.
This means, the check should be factored out into a function.
> + continue;
> + current_read_time+= join->static_pushdown_cost(and_part, idx);
> + }
> + }
> + else
> + current_read_time+= join->static_pushdown_cost(join->conds, idx);
> + }
> + }
> +
> if (current_read_time < join->best_read)
> {
> memcpy((uchar*) join->best_positions, (uchar*) join->positions,
> sizeof(POSITION) * (idx + 1));
> - join->record_count= partial_join_cardinality;
> + join->record_count= join->positions[idx].partial_join_cardinality;
> join->best_read= current_read_time - 0.001;
> }
> +
> DBUG_EXECUTE("opt", print_plan(join, idx+1,
> current_record_count,
> read_time,
> @@ -7605,7 +7756,7 @@ int JOIN_TAB::make_scan_filter()
> if (cond &&
> (tmp= make_cond_for_table(join->thd, cond,
> join->const_table_map | table->map,
> - table->map, -1, FALSE, TRUE)))
> + table->map, FALSE, TRUE)))
> {
> DBUG_EXECUTE("where",print_where(tmp,"cache", QT_ORDINARY););
> if (!(cache_select=
> @@ -8048,7 +8199,8 @@ get_best_combination(JOIN *join)
> sub-order
> */
> SJ_MATERIALIZATION_INFO *sjm= cur_pos->table->emb_sj_nest->sj_mat_info;
> - j->records= j->records_read= (ha_rows)(sjm->is_sj_scan? sjm->rows : 1);
> + j->records_read= sjm->is_sj_scan? sjm->rows : 1;
> + j->records= (ha_rows) j->records_read;
> j->cond_selectivity= 1.0;
> JOIN_TAB *jt;
> JOIN_TAB_RANGE *jt_range;
> @@ -8112,7 +8264,7 @@ get_best_combination(JOIN *join)
> Save records_read in JOIN_TAB so that select_describe()/etc don't have
> to access join->best_positions[].
> */
> - j->records_read= (ha_rows)join->best_positions[tablenr].records_read;
> + j->records_read= join->best_positions[tablenr].records_read;
> j->cond_selectivity= join->best_positions[tablenr].cond_selectivity;
> join->map2table[j->table->tablenr]= j;
>
> @@ -8614,6 +8766,7 @@ JOIN::make_simple_join(JOIN *parent, TAB
> bzero((char*) &join_tab->read_record,sizeof(join_tab->read_record));
> temp_table->status=0;
> temp_table->null_row=0;
> + *cur_exec_tab= *(parent->cur_exec_tab);
> DBUG_RETURN(FALSE);
> }
>
> @@ -8969,7 +9122,7 @@ make_join_select(JOIN *join,SQL_SELECT *
> join->exec_const_cond=
> make_cond_for_table(thd, cond,
> join->const_table_map,
> - (table_map) 0, -1, FALSE, FALSE);
> + (table_map) 0, FALSE, FALSE);
> /* Add conditions added by add_not_null_conds(). */
> for (uint i= 0 ; i < join->const_tables ; i++)
> add_cond_and_fix(thd, &join->exec_const_cond,
> @@ -8991,7 +9144,7 @@ make_join_select(JOIN *join,SQL_SELECT *
> join->const_table_map |
> OUTER_REF_TABLE_BIT,
> OUTER_REF_TABLE_BIT,
> - -1, FALSE, FALSE);
> + FALSE, FALSE);
> if (outer_ref_cond)
> {
> add_cond_and_fix(thd, &outer_ref_cond, join->outer_ref_cond);
> @@ -9005,7 +9158,7 @@ make_join_select(JOIN *join,SQL_SELECT *
> join->const_table_map |
> PSEUDO_TABLE_BITS,
> PSEUDO_TABLE_BITS,
> - -1, FALSE, FALSE);
> + FALSE, FALSE);
> if (pseudo_bits_cond)
> {
> add_cond_and_fix(thd, &pseudo_bits_cond,
> @@ -9102,8 +9255,10 @@ make_join_select(JOIN *join,SQL_SELECT *
> }
> else
> {
> - tmp= make_cond_for_table(thd, cond, used_tables, current_map, i,
> + tmp= make_cond_for_table(thd, cond, used_tables, current_map,
> FALSE, FALSE);
> + if (tmp)
> + tmp->walk(&Item::set_join_tab_idx_processor, 0, (uchar*) &i);
> }
> /* Add conditions added by add_not_null_conds(). */
> if (tab->select_cond)
> @@ -9184,7 +9339,7 @@ make_join_select(JOIN *join,SQL_SELECT *
> {
> COND *push_cond=
> make_cond_for_table(thd, tmp, current_map, current_map,
> - -1, FALSE, FALSE);
> + FALSE, FALSE);
> if (push_cond)
> {
> /* Push condition to handler */
> @@ -9357,7 +9512,7 @@ make_join_select(JOIN *join,SQL_SELECT *
> JOIN_TAB *cond_tab= join_tab->first_inner;
> COND *tmp= make_cond_for_table(thd, *join_tab->on_expr_ref,
> join->const_table_map,
> - (table_map) 0, -1, FALSE, FALSE);
> + (table_map) 0, FALSE, FALSE);
> if (!tmp)
> continue;
> tmp= new Item_func_trig_cond(tmp, &cond_tab->not_null_compl);
> @@ -9403,11 +9558,11 @@ make_join_select(JOIN *join,SQL_SELECT *
> current_map= tab->table->map;
> used_tables2|= current_map;
> /*
> - psergey: have put the -1 below. It's bad, will need to fix it.
> + psergey: After pushdown, set the join_tab_idx of tmp_cond to the
> + right join tab (tab - first_tab)?
> */
> COND *tmp_cond= make_cond_for_table(thd, on_expr, used_tables2,
> - current_map, /*(tab - first_tab)*/ -1,
> - FALSE, FALSE);
> + current_map, FALSE, FALSE);
> bool is_sjm_lookup_tab= FALSE;
> if (tab->bush_children)
> {
> @@ -10198,6 +10353,8 @@ uint check_join_cache_usage(JOIN_TAB *ta
> if ((tab->cache= new JOIN_CACHE_BNL(join, tab, prev_cache)) &&
> ((options & SELECT_DESCRIBE) || !tab->cache->init()))
> {
> + if (options & SELECT_DESCRIBE)
> + join->have_query_plan= JOIN::QEP_INCOMPLETE;
> tab->icp_other_tables_ok= FALSE;
> return (2-test(!prev_cache));
> }
> @@ -10233,6 +10390,8 @@ uint check_join_cache_usage(JOIN_TAB *ta
> if ((tab->cache= new JOIN_CACHE_BNLH(join, tab, prev_cache)) &&
> ((options & SELECT_DESCRIBE) || !tab->cache->init()))
> {
> + if (options & SELECT_DESCRIBE)
> + join->have_query_plan= JOIN::QEP_INCOMPLETE;
> tab->icp_other_tables_ok= FALSE;
> return (4-test(!prev_cache));
> }
> @@ -10253,7 +10412,11 @@ uint check_join_cache_usage(JOIN_TAB *ta
> prev_cache= 0;
> if ((tab->cache= new JOIN_CACHE_BKA(join, tab, flags, prev_cache)) &&
> ((options & SELECT_DESCRIBE) || !tab->cache->init()))
> + {
> + if (options & SELECT_DESCRIBE)
> + join->have_query_plan= JOIN::QEP_INCOMPLETE;
> return (6-test(!prev_cache));
> + }
> goto no_join_cache;
> }
> else
> @@ -10263,7 +10426,9 @@ uint check_join_cache_usage(JOIN_TAB *ta
> if ((tab->cache= new JOIN_CACHE_BKAH(join, tab, flags, prev_cache)) &&
> ((options & SELECT_DESCRIBE) || !tab->cache->init()))
> {
> - tab->idx_cond_fact_out= FALSE;
> + if (options & SELECT_DESCRIBE)
> + join->have_query_plan= JOIN::QEP_INCOMPLETE;
> + tab->idx_cond_fact_out= FALSE;
> return (8-test(!prev_cache));
> }
> goto no_join_cache;
> @@ -10422,10 +10587,9 @@ make_join_readinfo(JOIN *join, ulonglong
> for (tab= join->join_tab; tab != join->join_tab + join->const_tables; tab++)
> tab->partial_join_cardinality= 1;
>
> - JOIN_TAB *prev_tab= NULL;
> for (tab= first_linear_tab(join, WITHOUT_CONST_TABLES), i= join->const_tables;
> tab;
> - prev_tab=tab, tab= next_linear_tab(join, tab, WITH_BUSH_ROOTS))
> + tab= next_linear_tab(join, tab, WITH_BUSH_ROOTS))
> {
> /*
> The approximation below for partial join cardinality is not good because
> @@ -10435,12 +10599,9 @@ make_join_readinfo(JOIN *join, ulonglong
> Later it should be improved.
> */
>
> - if (tab->bush_root_tab && tab->bush_root_tab->bush_children->start == tab)
> - prev_tab= NULL;
> DBUG_ASSERT(tab->bush_children || tab->table == join->best_positions[i].table->table);
>
> - tab->partial_join_cardinality= join->best_positions[i].records_read *
> - (prev_tab? prev_tab->partial_join_cardinality : 1);
> + tab->partial_join_cardinality= join->best_positions[i].partial_join_cardinality;
> if (!tab->bush_children)
> i++;
> }
> @@ -10807,9 +10968,9 @@ double JOIN_TAB::scan_time()
> @todo: why not use JOIN_TAB::found_records
> */
>
> -ha_rows JOIN_TAB::get_examined_rows()
> +double JOIN_TAB::get_examined_rows_estimate()
> {
> - ha_rows examined_rows;
> + double examined_rows;
>
> if (select && select->quick && use_quick != 2)
> examined_rows= select->quick->records;
> @@ -10839,7 +11000,7 @@ ha_rows JOIN_TAB::get_examined_rows()
> }
> }
> else
> - examined_rows= (ha_rows) records_read;
> + examined_rows= records_read;
>
> return examined_rows;
> }
> @@ -16457,6 +16618,13 @@ do_select(JOIN *join,List<Item> *fields,
> sufficient to check only the condition pseudo_bits_cond.
> */
> DBUG_ASSERT(join->outer_ref_cond == NULL);
> + DBUG_EXECUTE_IF("show_explain_probe_do_select_const",
> + if (dbug_user_var_equals_int(join->thd,
> + "show_explain_probe_select_id",
> + join->select_lex->select_number))
> + dbug_serve_apcs(join->thd, 1);
> + );
> +
> if (!join->pseudo_bits_cond || join->pseudo_bits_cond->val_int())
> {
> error= (*end_select)(join, 0, 0);
> @@ -16512,6 +16680,23 @@ do_select(JOIN *join,List<Item> *fields,
> if (error == NESTED_LOOP_NO_MORE_ROWS || join->thd->killed == ABORT_QUERY)
> error= NESTED_LOOP_OK;
>
> + DBUG_EXECUTE("counts",
> + join->is_top_level_join() ?
> + print_execution_stats(join) : (void)0;);
> + /* Reset execution-time counters. */
> + if (join->table_count)
> + {
> + JOIN_TAB *first_tab, *tab;
> + first_tab= first_breadth_first_tab(join, WALK_OPTIMIZATION_TABS);
> + for (tab= first_tab; tab;
> + tab= next_breadth_first_tab(join, WALK_OPTIMIZATION_TABS, tab))
> + {
> + if (tab->bush_root_tab)
> + continue; /* TODO: how to process nested tabs?*/
> + tab->exec_partial_join_records= 0;
> + }
> + }
> +
> if (table)
> {
> int tmp, new_errno= 0;
> @@ -16927,6 +17112,8 @@ evaluate_join_record(JOIN *join, JOIN_TA
> if (join_tab->table->vfield)
> update_virtual_fields(join->thd, join_tab->table);
>
> + *(join->cur_exec_tab)= join_tab;
> +
> if (select_cond)
> {
> select_cond_result= test(select_cond->val_int());
> @@ -17053,6 +17240,7 @@ evaluate_join_record(JOIN *join, JOIN_TA
> if (found)
> {
> enum enum_nested_loop_state rc;
> + join_tab->exec_partial_join_records++;
> /* A match from join_tab is found for the current partial join. */
> rc= (*join_tab->next_select)(join, join_tab+1, 0);
> join->thd->warning_info->inc_current_row_for_warning();
> @@ -17106,6 +17294,7 @@ evaluate_null_complemented_join_record(J
> and no matches has been found for the current outer row.
> */
> JOIN_TAB *last_inner_tab= join_tab->last_inner;
> + *(join->cur_exec_tab)= join_tab;
> /* Cache variables for faster loop */
> COND *select_cond;
> for ( ; join_tab <= last_inner_tab ; join_tab++)
> @@ -17169,6 +17358,8 @@ evaluate_null_complemented_join_record(J
> join->return_tab= join_tab->do_firstmatch;
> }
>
> + join_tab->exec_partial_join_records++;
> +
> /*
> Send the row complemented by nulls to be joined with the
> remaining tables.
> @@ -18536,21 +18727,34 @@ bool test_if_ref(Item *root_cond, Item_f
> static Item *
> make_cond_for_table(THD *thd, Item *cond, table_map tables,
> table_map used_table,
> - int join_tab_idx_arg,
> bool exclude_expensive_cond __attribute__((unused)),
> bool retain_ref_cond)
> {
> return make_cond_for_table_from_pred(thd, cond, cond, tables, used_table,
> - join_tab_idx_arg,
> exclude_expensive_cond,
> retain_ref_cond);
> }
>
>
> +static int compare_items_by_cost(Item* it1, Item* it2, void *arg)
> +{
> + double cost1= 0, cost2= 0;
> +
> + it1->walk(&Item::sum_cost_processor, false, (uchar*) &cost1);
> + it2->walk(&Item::sum_cost_processor, false, (uchar*) &cost2);
> +
> + if (cost1 == cost2)
> + return 0;
> + if (cost1 > cost2)
> + return -1;
> + else
> + return 1;
> +}
> +
> +
> static Item *
> make_cond_for_table_from_pred(THD *thd, Item *root_cond, Item *cond,
> table_map tables, table_map used_table,
> - int join_tab_idx_arg,
> bool exclude_expensive_cond __attribute__
> ((unused)),
> bool retain_ref_cond)
> @@ -18573,7 +18777,6 @@ make_cond_for_table_from_pred(THD *thd,
> {
> Item *fix=make_cond_for_table_from_pred(thd, root_cond, item,
> tables, used_table,
> - join_tab_idx_arg,
> exclude_expensive_cond,
> retain_ref_cond);
> if (fix)
> @@ -18594,6 +18797,11 @@ make_cond_for_table_from_pred(THD *thd,
> new_cond->used_tables_cache=
> ((Item_cond_and*) cond)->used_tables_cache &
> tables;
> + /*
> + Move expensive conditions to the end of the AND list to reduce the
> + number of times they are evaluated.
> + */
> + bubble_sort<Item>(new_cond->argument_list(), compare_items_by_cost, NULL);
> return new_cond;
> }
> }
> @@ -18608,7 +18816,6 @@ make_cond_for_table_from_pred(THD *thd,
> {
> Item *fix=make_cond_for_table_from_pred(thd, root_cond, item,
> tables, 0L,
> - join_tab_idx_arg,
> exclude_expensive_cond,
> retain_ref_cond);
> if (!fix)
> @@ -18623,6 +18830,11 @@ make_cond_for_table_from_pred(THD *thd,
> new_cond->fix_fields(thd, 0);
> new_cond->used_tables_cache= ((Item_cond_or*) cond)->used_tables_cache;
> new_cond->top_level_item();
> + /*
> + Move expensive conditions to the end of the AND list to reduce the
> + number of times they are evaluated.
> + */
> + bubble_sort<Item>(new_cond->argument_list(), compare_items_by_cost, NULL);
> return new_cond;
> }
> }
> @@ -18638,7 +18850,6 @@ make_cond_for_table_from_pred(THD *thd,
>
> if (cond->marker == 2 || cond->eq_cmp_result() == Item::COND_OK)
> {
> - cond->set_join_tab_idx(join_tab_idx_arg);
> return cond; // Not boolean op
> }
>
> @@ -18661,7 +18872,6 @@ make_cond_for_table_from_pred(THD *thd,
> }
> }
> cond->marker=2;
> - cond->set_join_tab_idx(join_tab_idx_arg);
> return cond;
> }
>
> @@ -19688,6 +19898,7 @@ create_sort_index(THD *thd, JOIN *join,
> tab= join->join_tab + join->const_tables;
> table= tab->table;
> select= tab->select;
> + *(join->cur_exec_tab)= tab;
>
> JOIN_TAB *save_pre_sort_join_tab= NULL;
> if (join->pre_sort_join_tab)
> @@ -22287,7 +22498,8 @@ int JOIN::print_explain(select_result_si
> DBUG_PRINT("info", ("Select 0x%lx, type %s, message %s",
> (ulong)join->select_lex, join->select_lex->type,
> message ? message : "NULL"));
> - DBUG_ASSERT(on_the_fly? have_query_plan == QEP_AVAILABLE: TRUE);
> + DBUG_ASSERT(on_the_fly? (have_query_plan == QEP_EXPLAINABLE ||
> + have_query_plan == QEP_EXECUTABLE) : TRUE);
> /* Don't log this into the slow query log */
>
> if (!on_the_fly)
> @@ -22326,6 +22538,9 @@ int JOIN::print_explain(select_result_si
> uint select_id= join->select_lex->select_number;
> JOIN_TAB* const first_top_tab= first_breadth_first_tab(join, WALK_OPTIMIZATION_TABS);
>
> + if (!join->initialized && join->setup_dynamic_conditions())
> + return 1;
> +
> for (JOIN_TAB *tab= first_breadth_first_tab(join, WALK_OPTIMIZATION_TABS); tab;
> tab= next_breadth_first_tab(join, WALK_OPTIMIZATION_TABS, tab))
> {
> @@ -22583,7 +22798,7 @@ int JOIN::print_explain(select_result_si
> }
> else
> {
> - ha_rows examined_rows= tab->get_examined_rows();
> + double examined_rows= tab->get_examined_rows_estimate();
>
> item_list.push_back(new Item_int((longlong) (ulonglong) examined_rows,
> MY_INT64_NUM_DECIMAL_DIGITS));
> @@ -22596,7 +22811,7 @@ int JOIN::print_explain(select_result_si
> {
> double pushdown_cond_selectivity= tab->cond_selectivity;
> if (pushdown_cond_selectivity == 1.0)
> - f= (float) (100.0 * tab->records_read / examined_rows);
> + f= (100.0 * (float)tab->records_read) / examined_rows;
> else
> f= (float) (100.0 * pushdown_cond_selectivity);
> }
> @@ -22687,6 +22902,25 @@ int JOIN::print_explain(select_result_si
> }
> else
> extra.append(STRING_WITH_LEN("; Using where"));
> + /* Add the id's of the subselects to the table they are pushed to. */
> + if (explain_flags & DESCRIBE_EXTENDED)
> + {
> + List<Item_subselect> subs_list;
> + *cur_exec_tab= tab;
> + tab->select_cond->walk(&Item::collect_item_subselect_processor, 0,
> + (uchar*) &subs_list);
> + if (subs_list.elements)
> + {
> + List_iterator_fast<Item_subselect> subs_it(subs_list);
> + Item_subselect *subs;
> + extra.append(STRING_WITH_LEN("; Subqueries:"));
> + while ((subs= subs_it++))
> + {
> + extra.append(STRING_WITH_LEN(" "));
> + extra.append_ulonglong(subs->get_select_lex()->select_number);
> + }
> + }
> + }
> }
> }
> if (table_list /* SJM bushes don't have table_list */ &&
> @@ -23652,6 +23886,204 @@ void JOIN::cache_const_exprs()
> }
>
>
> +double JOIN::static_pushdown_cost(Item *pushed_cond, uint idx)
> +{
psergey: This function needs a comment.
I see that it modifies position opt_pos->read_time. What is the function
that undoes this change?
> + table_map cond_tables= pushed_cond->used_tables();
> + POSITION *last_pos= NULL;
> + POSITION *opt_pos;
> + double pushed_cond_cost;
> +
> + DBUG_ASSERT(pushed_cond->with_subselect
> + /*pushed_cond->is_expensive()*/);
> +
> + if (!cond_tables || cond_tables == OUTER_REF_TABLE_BIT || idx <= const_tables)
> + return 0;
> +
> + /*
> + Find the last join_tab on which the subquery depends. This is the
> + earliest join_tab where the condition can be pushed.
> + */
> + for (POSITION *p_pos= positions + idx, *p_end= positions + const_tables;
> + p_pos >= p_end ; p_pos--)
> + {
> + if (p_pos->table->table->map & cond_tables)
> + {
> + last_pos= p_pos;
> + break;
> + }
> + }
> + if (!last_pos)
> + return 0; // pushed_cond depends only on constant tables
> +
> + opt_pos= last_pos->least_card_pos;
> +
> + /*
> + Update the cost of the POSITION where the subquery is pushed to,
> + and the total cost of the query plan.
> + */
> + pushed_cond_cost= opt_pos->partial_join_cardinality * pushed_cond->get_cost();
> + opt_pos->read_time+= pushed_cond_cost;
> +
> + return pushed_cond_cost;
> +}
> +
> +
> +/**
> + Make expensive subqueries dynamically pushdownable.
> +*/
> +
> +bool JOIN::setup_dynamic_conditions()
> +{
> + List<Item_func_dynamic_cond> prev_dynamic_conds;
> + List<Item_func_dynamic_cond> cur_dynamic_conds;
> + bool do_static_pushdown= false, do_dynamic_pushdown= false;
> + JOIN_TAB *first_tab, *last_tab, *min_tab, *tab;
> +
> + if (!(optimizer_flag(thd, OPTIMIZER_SWITCH_EXPENSIVE_PRED_STATIC_PUSHDOWN) ||
> + optimizer_flag(thd, OPTIMIZER_SWITCH_EXPENSIVE_PRED_DYNAMIC_PUSHDOWN)) || !conds ||
> + (table_count - const_tables < 2))
> + return false;
> +
> + first_tab= join_tab + const_tables;
> + last_tab= join_tab + top_join_tab_count;
> +
> + /*
> + Update the least cardinality join_tab for each tab for the chosen join order.
> + Check if there are any conditions at all that need to be made dynamic.
> + */
> + min_tab= NULL;
> + for (tab= last_tab - 1; tab >= first_tab; tab--)
> + {
> + if (optimizer_flag(thd, OPTIMIZER_SWITCH_EXPENSIVE_PRED_STATIC_PUSHDOWN))
psergey: Is there any reason to have this check inside the loop? Please move
it outside.
> + {
> + if (!min_tab ||
> + tab->partial_join_cardinality <= min_tab->partial_join_cardinality)
> + min_tab= tab;
> + tab->least_card_tab= min_tab;
> + /*
> + At least one join_tab must have lower cardinality than its preceeding
> + join_tabs in order to use static pushdown.
> + */
> + if (tab != tab->least_card_tab)
> + do_static_pushdown= true;
> + }
> + else
> + {
> + /* Only dynamic pushdown, push conditions to the left-most possible tab. */
> + tab->least_card_tab= tab;
> + }
> +
> + if (optimizer_flag(thd, OPTIMIZER_SWITCH_EXPENSIVE_PRED_DYNAMIC_PUSHDOWN) &&
> + tab->select_cond && tab->select_cond->needs_dynamic_cond())
> + do_dynamic_pushdown= true;
> + }
> +
> + if (!do_static_pushdown && !do_dynamic_pushdown)
> + return false;
> +
> + /*
> + TODO: the following loop processes only the top-level JOIN_TABs. The loop
> + skips semi-join nests. Each semi-join nest must be processed in the same
> + way as the top-level joins.
> + */
> + for (tab= first_tab; tab < last_tab; tab++)
> + {
> + Item_func_collect_stats *wrapped_cond;
> +
> + if (tab->select_cond)
> + {
> + if (is_cond_and(tab->select_cond))
> + {
> + List_iterator<Item> and_parts_it(*((Item_cond*) tab->select_cond)->argument_list());
> + Item *and_part;
> + while ((and_part= and_parts_it++))
> + {
> + if (and_part->needs_dynamic_cond())
> + {
> + /*
> + TODO:
> + If the leftmost join_tab where the dynamic condion can be pushed
> + is in fact the last join table, do not wrap the condition since
> + it has nowhere to move.
> + */
> + if (!(wrapped_cond= new Item_func_dynamic_cond(and_part,
> + tab->least_card_tab,
> + cur_exec_tab)))
> + return true;
> + cur_dynamic_conds.push_back((Item_func_dynamic_cond*)wrapped_cond);
> + }
> + else
> + {
> + if (!(wrapped_cond= new Item_func_collect_stats(and_part)))
> + return true;
> + }
> + and_parts_it.replace(wrapped_cond);
> + }
> + }
> + else if (tab->select_cond->needs_dynamic_cond())
> + {
> + /*
> + TODO:
> + If the leftmost join_tab where the dynamic condion can be pushed
> + is in fact the last join table, do not wrap the condition since
> + it has nowhere to move.
> + */
> + if (!(wrapped_cond= new Item_func_dynamic_cond(tab->select_cond,
> + tab->least_card_tab,
> + cur_exec_tab)))
> + return true;
> + tab->set_select_cond(wrapped_cond, __LINE__);
> + cur_dynamic_conds.push_back((Item_func_dynamic_cond*)wrapped_cond);
> + }
> + else
> + {
> + if (!(wrapped_cond= new Item_func_collect_stats(tab->select_cond)))
> + return true;
> + tab->set_select_cond(wrapped_cond, __LINE__);
> + }
> + }
> +
> + /*
> + Add all dynamic conditions pushed to previous JOIN_TABs also to the
> + current JOIN_TAB. This allows to "move" a dynamic condition
> + from one tab to another by enabling it for the corrseponding tab.
> + */
> + List_iterator_fast<Item_func_dynamic_cond> dyn_cond_it(prev_dynamic_conds);
> + while ((wrapped_cond= dyn_cond_it++))
> + {
> + if (is_cond_and(tab->select_cond))
> + ((Item_cond_and*)tab->select_cond)->add(wrapped_cond);
> + else
> + {
> + tab->set_select_cond(and_items(tab->select_cond, wrapped_cond), __LINE__);
> + tab->select_cond->quick_fix_field();
> + }
> + }
> + prev_dynamic_conds.concat(&cur_dynamic_conds);
> + cur_dynamic_conds.empty();
> +
> + /*
> + If the top-level condition attached to this JOIN_TAB is not wrapped in a
> + condition that can collect statistics, wrap it into one.
> + */
> + if (tab->select_cond &&
> + !(tab->select_cond->type() == Item::FUNC_ITEM &&
> + (((Item_func*)tab->select_cond)->functype() == Item_func::DYNAMIC_COND_FUNC ||
> + ((Item_func*)tab->select_cond)->functype() == Item_func::EXEC_STATS_FUNC)))
> + {
> + if (!(wrapped_cond= new Item_func_collect_stats(tab->select_cond)))
> + return true;
> + tab->set_select_cond(wrapped_cond, __LINE__);
> + }
> +
> + if (tab->select_cond)
> + tab->select_cond->quick_fix_field();
> + }
> +
> + return false;
> +}
> +
> +
> /**
> Find a cheaper access key than a given @a key
>
>
> === modified file 'sql/sql_select.h'
> --- sql/sql_select.h 2013-07-17 19:24:29 +0000
> +++ sql/sql_select.h 2013-09-27 08:25:14 +0000
> @@ -289,8 +289,8 @@ typedef struct st_join_table {
> */
> double read_time;
>
> - /* psergey-todo: make the below have type double, like POSITION::records_read? */
> - ha_rows records_read;
> + /* Copy of POSITION::records_read, set by get_best_combination() */
> + double records_read;
>
> /* The selectivity of the conditions that can be pushed to the table */
> double cond_selectivity;
> @@ -300,6 +300,12 @@ typedef struct st_join_table {
>
> double partial_join_cardinality;
>
> + /* Execution-time counters of actual cardinality and selectivity. */
> + /* The cardinality of the partial join represented by this JOIN_TAB. */
> + double exec_partial_join_records;
psergey: I understand this is a counter? Why use floating point type for it,
then?
> + /* Estimate of the fanout of this join based on execution-time counters. */
> + double exec_fanout;
> +
> table_map dependent,key_dependent;
> /*
> 1 - use quick select
> @@ -377,6 +383,12 @@ typedef struct st_join_table {
> NULL - Not doing a loose scan on this join tab.
> */
> struct st_join_table *loosescan_match_tab;
> +
> + /*
> + The JOIN_TAB after this one with the least partial join cardinality.
> + Valid only after the join optimization phase.
> + */
> + st_join_table *least_card_tab;
>
> /* TRUE <=> we are inside LooseScan range */
> bool inside_loosescan_range;
> @@ -522,7 +534,7 @@ typedef struct st_join_table {
> return (is_hash_join_key_no(key) ? hj_key : table->key_info+key);
> }
> double scan_time();
> - ha_rows get_examined_rows();
> + double get_examined_rows_estimate();
> bool preread_init();
>
> bool is_sjm_nest() { return test(bush_children); }
> @@ -767,6 +779,8 @@ typedef struct st_position :public Sql_a
> {
> /* The table that's put into join order */
> JOIN_TAB *table;
> + /* The JOIN_TAB after this one with the least partial join cardinality */
> + st_position *least_card_pos;
>
> /*
> The "fanout": number of output rows that will be produced (after
> @@ -777,6 +791,7 @@ typedef struct st_position :public Sql_a
>
> /* The selectivity of the pushed down conditions */
> double cond_selectivity;
> + double partial_join_cardinality;
>
> /*
> Cost accessing the table in course of the entire complete join execution,
> @@ -917,6 +932,7 @@ class JOIN :public Sql_alloc
> void restore_query_plan(Join_plan_state *restore_from);
> /* Choose a subquery plan for a table-less subquery. */
> bool choose_tableless_subquery_plan();
> + double examined_rows_estimate;
>
> public:
> JOIN_TAB *join_tab, **best_ref;
> @@ -1214,13 +1230,20 @@ class JOIN :public Sql_alloc
>
> bool union_part; ///< this subselect is part of union
>
> - enum join_optimization_state { NOT_OPTIMIZED=0,
> - OPTIMIZATION_IN_PROGRESS=1,
> - OPTIMIZATION_DONE=2};
> bool optimized; ///< flag to avoid double optimization in EXPLAIN
> bool initialized; ///< flag to avoid double init_execution calls
>
> - enum { QEP_NOT_PRESENT_YET, QEP_AVAILABLE, QEP_DELETED} have_query_plan;
> + enum {QEP_NOT_PRESENT_YET,
> + /* A query plan exists. It is being optimized, but is still incomplete. */
> + QEP_INCOMPLETE,
> + /*
> + A query plan exists, but it is usable only for EXPLAIN because
> + some of its structures are incomplete.
> + */
> + QEP_EXPLAINABLE,
> + /* A query plan exists, and is ready for execution. */
> + QEP_EXECUTABLE,
> + QEP_DELETED} have_query_plan;
>
> /*
> Additional WHERE and HAVING predicates to be considered for IN=>EXISTS
> @@ -1248,6 +1271,10 @@ class JOIN :public Sql_alloc
> TABLE *table_reexec[1]; // make_simple_join()
> JOIN_TAB *join_tab_reexec; // make_simple_join()
> /* end of allocation caching storage */
> + /*
> + The index of the current partial join being processed during query execution.
> + */
> + JOIN_TAB **cur_exec_tab;
>
> JOIN(THD *thd_arg, List<Item> &fields_arg, ulonglong select_options_arg,
> select_result *result_arg)
> @@ -1259,7 +1286,7 @@ class JOIN :public Sql_alloc
> void init(THD *thd_arg, List<Item> &fields_arg, ulonglong select_options_arg,
> select_result *result_arg)
> {
> - join_tab= join_tab_save= 0;
> + join_tab= join_tab_save= table_access_tabs= 0;
> table= 0;
> table_count= 0;
> top_join_tab_count= 0;
> @@ -1274,6 +1301,7 @@ class JOIN :public Sql_alloc
> found_records= 0;
> fetch_limit= HA_POS_ERROR;
> examined_rows= 0;
> + examined_rows_estimate= -1;
> exec_tmp_table1= 0;
> exec_tmp_table2= 0;
> sortorder= 0;
> @@ -1334,6 +1362,7 @@ class JOIN :public Sql_alloc
> query plan was produced
> */
> table_access_tabs= NULL;
> + cur_exec_tab= NULL;
> }
>
> int prepare(Item ***rref_pointer_array, TABLE_LIST *tables, uint wind_num,
> @@ -1352,6 +1381,7 @@ class JOIN :public Sql_alloc
> bool alloc_func_list();
> bool flatten_subqueries();
> bool optimize_unflattened_subqueries();
> + bool optimize_in_subqueries();
> bool optimize_constant_subqueries();
> bool make_sum_func_list(List<Item> &all_fields, List<Item> &send_fields,
> bool before_group_by, bool recompute= FALSE);
> @@ -1452,9 +1482,13 @@ class JOIN :public Sql_alloc
> void get_prefix_cost_and_fanout(uint n_tables,
> double *read_time_arg,
> double *record_count_arg);
> - double get_examined_rows();
> + double get_examined_rows_estimate();
> + double total_subquery_cost();
> +
> /* defined in opt_subselect.cc */
> bool transform_max_min_subquery();
> + double static_pushdown_cost(Item *cond, uint idx);
> + bool setup_dynamic_conditions();
> /* True if this JOIN is a subquery under an IN predicate. */
> bool is_in_subquery()
> {
>
> === modified file 'sql/sql_test.cc'
> --- sql/sql_test.cc 2013-06-06 15:51:28 +0000
> +++ sql/sql_test.cc 2013-09-27 08:25:14 +0000
> @@ -365,7 +365,11 @@ print_plan(JOIN* join, uint idx, double
> pos= join->best_positions[i];
> table= pos.table->table;
> if (table)
> + {
> fputs(table->s->table_name.str, DBUG_FILE);
> + fprintf(DBUG_FILE, "(records_read: %.2f, read_time: %.2f)",
> + pos.records_read, pos.read_time);
> + }
> fputc(' ', DBUG_FILE);
> }
> }
> @@ -377,10 +381,11 @@ print_plan(JOIN* join, uint idx, double
> {
> join_table= (*plan_nodes);
> fputs(join_table->table->s->table_name.str, DBUG_FILE);
> - fprintf(DBUG_FILE, "(%lu,%lu,%lu)",
> - (ulong) join_table->found_records,
> - (ulong) join_table->records,
> - (ulong) join_table->read_time);
> + fprintf(DBUG_FILE, "(found_records: %llu, records_read: %llu, records: %llu, read_time: %.2f)",
> + (ulonglong) join_table->found_records,
> + (ulonglong) join_table->records_read,
> + (ulonglong) join_table->records,
> + join_table->read_time);
> fputc(' ', DBUG_FILE);
> }
> fputc('\n', DBUG_FILE);
>
> === modified file 'sql/sql_update.cc'
> --- sql/sql_update.cc 2013-07-17 19:24:29 +0000
> +++ sql/sql_update.cc 2013-04-01 10:36:05 +0000
> @@ -358,7 +358,7 @@ int mysql_update(THD *thd,
> }
>
> /* Apply the IN=>EXISTS transformation to all subqueries and optimize them. */
> - if (select_lex->optimize_unflattened_subqueries(false))
> + if (select_lex->optimize_subqueries(NULL))
> DBUG_RETURN(TRUE);
>
> if (select_lex->inner_refs_list.elements &&
>
> === modified file 'sql/sys_vars.cc'
> --- sql/sys_vars.cc 2013-08-26 11:26:21 +0000
> +++ sql/sys_vars.cc 2013-09-27 08:25:14 +0000
> @@ -1836,6 +1836,8 @@ export const char *optimizer_switch_name
> "table_elimination",
> "extended_keys",
> "exists_to_in",
> + "expensive_pred_static_pushdown",
> + "expensive_pred_dynamic_pushdown",
> "default", NullS
> };
> /** propagates changes to @@engine_condition_pushdown */
> @@ -1878,7 +1880,9 @@ static Sys_var_flagset Sys_optimizer_swi
> "subquery_cache, "
> "table_elimination, "
> "extended_keys, "
> - "exists_to_in "
> + "exists_to_in, "
> + "expensive_pred_static_pushdown, "
> + "expensive_pred_dynamic_pushdown"
> "} and val is one of {on, off, default}",
> SESSION_VAR(optimizer_switch), CMD_LINE(REQUIRED_ARG),
> optimizer_switch_names, DEFAULT(OPTIMIZER_SWITCH_DEFAULT),
> @@ -4243,6 +4247,36 @@ static Sys_var_harows Sys_expensive_subq
> SESSION_VAR(expensive_subquery_limit), CMD_LINE(REQUIRED_ARG),
> VALID_RANGE(0, HA_POS_ERROR), DEFAULT(100), BLOCK_SIZE(1));
>
> +static Sys_var_harows Sys_dynamic_pushdown_max_evals(
> + "dynamic_pushdown_max_evals",
> + "The number of times a dynamic condition must be evaluated in "
> + "order to trigger dynamic pushdown",
> + SESSION_VAR(dynamic_pushdown_max_evals), CMD_LINE(REQUIRED_ARG),
> + VALID_RANGE(0, HA_POS_ERROR), DEFAULT(100), BLOCK_SIZE(1));
> +
> +static Sys_var_uint Sys_dynamic_pushdown_cost_ratio(
> + "dynamic_pushdown_cost_ratio",
> + "The ratio (in %) by which a new placement of the predicate in the plan"
> + "must improve the plan cost in order to trigger dynamic pushdown.",
> + SESSION_VAR(dynamic_pushdown_cost_ratio), CMD_LINE(REQUIRED_ARG),
> + VALID_RANGE(0, 100), DEFAULT(25), BLOCK_SIZE(1));
> +
> +static Sys_var_uint Sys_dynamic_pushdown_sampling_probability(
> + "dynamic_pushdown_sampling_probability",
> + "The probability (in %) with which to evaluate this condition in order"
> + "to estimate its selectivity.",
> + SESSION_VAR(dynamic_pushdown_sampling_probability), CMD_LINE(REQUIRED_ARG),
> + VALID_RANGE(0, 100), DEFAULT(3), BLOCK_SIZE(1));
> +
> +static Sys_var_uint Sys_dynamic_pushdown_max_selectivity(
> + "dynamic_pushdown_max_selectivity",
> + "If a predicate's selectivity is below this limit, it is considered"
> + "sufficiently selective to re-filter all records in a partial blocked join."
> + "The selectivity itself is estimated by sampling the predicate with"
> + "probability 'dynamic_pushdown_sampling_probability'.",
> + SESSION_VAR(dynamic_pushdown_max_selectivity), CMD_LINE(REQUIRED_ARG),
> + VALID_RANGE(0, 100), DEFAULT(25), BLOCK_SIZE(1));
> +
> static bool check_pseudo_slave_mode(sys_var *self, THD *thd, set_var *var)
> {
> longlong previous_val= thd->variables.pseudo_slave_mode;
>
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB | Skype: sergefp | Blog: http://s.petrunia.net/blog
1
1
[Maria-developers] @@optimizer_search_depth default value - why 62 and not 0?
by Sergei Petrunia 25 Oct '13
by Sergei Petrunia 25 Oct '13
25 Oct '13
Hi Timour,
Me and Stephane were discussing a performance issue, and are wondering - why
does the @@optimizer_search_depth has a value of 62?
There is a vale of optimizer_search_depth=0 which tells the optimizer to chose
depth automatically (exhaustive if less than 7 tables, greedy search if more
than that). This value looks like it was intended to be the default.
Yet, currently the default (in both MariaDB and MySQL) is 62, which tells the
optimizer use the "new" (as in "new in 5.1") greedy search code, but sets the
depth so high that it uses exhaustive search. Is there any reason behind this?
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB | Skype: sergefp | Blog: http://s.petrunia.net/blog
1
0
Re: [Maria-developers] [Commits] Rev 3710: MariaDB made be compiled by gcc 4.8.1
by Sergei Golubchik 24 Oct '13
by Sergei Golubchik 24 Oct '13
24 Oct '13
Hi, Sanja!
On Oct 24, sanja(a)montyprogram.com wrote:
> At file:///home/bell/maria/bzr/work-maria-5.3-test/
>
> ------------------------------------------------------------
> revno: 3710
> revision-id: sanja(a)montyprogram.com-20131024170011-r5r6fd84y4x1yr09
> parent: bar(a)mnogosearch.org-20131021093629-0nyk7osjssern1sa
> committer: sanja(a)montyprogram.com
> branch nick: work-maria-5.3-test
> timestamp: Thu 2013-10-24 20:00:11 +0300
> message:
> MariaDB made be compiled by gcc 4.8.1
Okay, that looks much better.
Please push it in a stage tree, just to check that it doesn't break on
all other compilers that we use, and then push into main.
Thanks!
Regards,
Sergei
1
0
24 Oct '13
Hi, Sanja!
> >>> === modified file 'storage/maria/ma_checksum.c'
> >>> --- a/storage/maria/ma_checksum.c 2008-04-03 13:40:25 +0000
> >>> +++ b/storage/maria/ma_checksum.c 2013-10-22 07:56:05 +0000
> >>> @@ -58,7 +58,7 @@ ha_checksum _ma_checksum(MARIA_HA *info,
> >>> length= _ma_calc_blob_length(blob_size_length, pos);
> >>> if (length)
> >>> {
> >>> - memcpy((char*) &pos, pos + blob_size_length, sizeof(char*));
> >>> + memcpy((char*) &pos, pos + blob_size_length, (sizeof(char*)));
> >> Why?
> > ma_checksum.c: In function '_ma_checksum':
> > ma_checksum.c:61:60: error: argument to 'sizeof' in 'memcpy' call is the
> > same pointer type 'char *' as the destination; expected 'char' or an
> > explicit length [-Werror=sizeof-pointer-memaccess]
> > memcpy((char*) &pos, pos + blob_size_length, sizeof(char*));
> > ^
>
> shouldn't it be:
> memcpy((char**) &pos, pos + blob_size_length, sizeof(char*));
Sanja ^^^
That would be a better fix. The result looks more logical than with
meaningless parentheses.
Or, I suppose, you can try to drop the cast completely:
memcpy(&pos, pos + blob_size_length, sizeof(pos));
> > BTW all other such fixes was done because this check.
Yes, I've figured it out.
Regards,
Sergei
2
2
Hi guys, how could i optimize a hash index using like?
i was thinking about something like a index over first left bytes using
hash, example:
field LIKE "abc%"
create index hash_index( field(3) ) using hash
any idea if this could be possible? this could boost some queries here that
i'm using btree
today i'm using the LIKE, but i'm considering a virtual column + a index
over virtual column + a ( AND virtual_column='abc')
any other idea? or maybe optimize the LIKE function
--
Roberto Spadim
2
1
23 Oct '13
Hi, Sanja!
See below
On Oct 22, sanja(a)montyprogram.com wrote:
> message:
> MariaDB made be compiled by gcc 4.8.1
> === modified file 'extra/yassl/include/yassl_int.hpp'
> --- a/extra/yassl/include/yassl_int.hpp 2012-04-05 08:49:38 +0000
> +++ b/extra/yassl/include/yassl_int.hpp 2013-10-22 07:56:05 +0000
> @@ -440,7 +440,7 @@ public:
> const Ciphers& GetCiphers() const;
> const DH_Parms& GetDH_Parms() const;
> const Stats& GetStats() const;
> - const VerifyCallback getVerifyCallback() const;
> + VerifyCallback getVerifyCallback() const;
Why?
> === modified file 'storage/maria/ma_checksum.c'
> --- a/storage/maria/ma_checksum.c 2008-04-03 13:40:25 +0000
> +++ b/storage/maria/ma_checksum.c 2013-10-22 07:56:05 +0000
> @@ -58,7 +58,7 @@ ha_checksum _ma_checksum(MARIA_HA *info,
> length= _ma_calc_blob_length(blob_size_length, pos);
> if (length)
> {
> - memcpy((char*) &pos, pos + blob_size_length, sizeof(char*));
> + memcpy((char*) &pos, pos + blob_size_length, (sizeof(char*)));
Why?
> crc= my_checksum(crc, pos, length);
> }
> continue;
Regards,
Sergei
2
1
hi guys!
just a warning , bye :)
[ 99%] Building C object tests/CMakeFiles/bug25714.dir/bug25714.c.o
/home/src/mariadb/mariadb-5.5.33a/tests/bug25714.c: In function 'main':
/home/src/mariadb/mariadb-5.5.33a/tests/bug25714.c:26:7: warning: variable
'OK' set but not used [-Wunused-but-set-variable]
int OK;
^
--
Roberto Spadim
1
0
Re: [Maria-developers] Comments for parallel replication patch (MDEV-4506)
by Kristian Nielsen 18 Oct '13
by Kristian Nielsen 18 Oct '13
18 Oct '13
Michael Widenius <michael.widenius(a)gmail.com> writes:
>> What about Rotate_log_event::do_update_pos() and
>> Stop_log_event::do_update_pos()?
>>
>> They currently access the flag in rli. I am not sure that is right, as
> this
>> could be far ahead of the event they are actually executing?
>
> Aren't the above events handled by the sql driver thread?
No, by the worker threads. I do not understand how it could be done in the
driver thread? do_update_pos updates the current position of the last event
executed, it cannot happen until after events have been executed? And this
happens in the worker threads.
> My thought was that the driver thread was in charge of all reading of the
> relay log and would thus hide any stop or rotate events from the sql
> execution threads.
The Rotate event is what updates the group_master_log_name to the name of the
next binlog file on the master. This must not happen until previous events
have been executed (otherwise they will be wrongly skipped if we stop and
restart the slave).
Or did I misunderstand?
My best suggestion at the moment is to copy the required information from
m_flags (and a couple of other fields from rli) into rgi, so that the right
value gets associated with the right event in each worker thread. But let's
discuss when you are back from travelling if you have a better idea.
> The purgeing of log should be changed to check the position of the st
> committed event. It's safe to delay any older logs.
> The other I am not sure about as I can't look at the code just know.
> However we need a solution that works also for masters that don't support
> GTID.
Yes, the non-GTID case is the main problem. (Using GTID, everything is updated
automatically in parallel in the mysql.gtid_slave_pos and there is a sub_id
field that ensures we can figure out which GTIDs are the most recent ones).
- Kristian.
1
0
Hi!
Just a quick status update of parallel replication in MariaDB 10.0
I yesterday pushed to the 10.0-knielsen tree changes that makes many
of the replication variables thread safe. I will today push a patch
that should fix the rest of the variables.
Here is the changelog entry for this:
Fixes for parallel slave:
- Made slaves temporary table multi-thread slave safe by adding mutex around save_temporary_table usage.
- rli->save_temporary_tables is the active list of all used temporary tables
- This is copied to THD->temporary_tables when temporary tables are opened and updated when temporary tables are closed
- Added THD->lock_temporary_tables() and THD->unlock_temporary_tables() to simplify this.
- Relay_log_info->sql_thd renamed to Relay_log_info->sql_driver_thd to avoid wrong usage for merged code.
- Added is_part_of_group() to mark functions that are part of the next function. This replaces setting IN_STMT when events are executed.
- Added is_begin(), is_commit() and is_rollback() functions to Query_log_event to simplify code.
- If slave_skip_counter is set run things in single threaded mode. This simplifies code for skipping events.
- Updating state of relay log (IN_STMT and IN_TRANSACTION) is moved to one single function: update_state_of_relay_log()
We can't use OPTION_BEGIN to check for the state anymore as the sql_driver and sql execution threads may be different.
Clear IN_STMT and IN_TRANSACTION in init_relay_log_pos() and Relay_log_info::cleanup_context() to ensure the flags doesn't survive slave restarts
is_in_group() is now independent of state of executed transaction.
- Reset thd->transaction.all.modified_non_trans_table() if we did set it for single table row events.
This was mainly for keeping the flag as documented.
- Changed slave_open_temp_tables to uint32 to be able to use atomic operators on it.
- Relay_log_info::sleep_lock -> rpl_group_info::sleep_lock
- Relay_log_info::sleep_cond -> rpl_group_info::sleep_cond
- Changed some functions to take rpl_group_info instead of Relay_log_info to make them multi-slave safe and to simplify usage
- do_shall_skip()
- continue_group()
- sql_slave_killed()
- next_event()
- Simplifed arguments to io_slave_killed(), check_io_slave_killed() and sql_slave_killed(); No reason to supply THD as this is part of the given structure.
- set_thd_in_use_temporary_tables() removed as in_use is set on usage
- Added information to thd_proc_info() which thread is waiting for slave mutex to exit.
- In open_table() reuse code from find_temporary_table()
Other things:
- More DBUG statements
- Fixed the rpl_incident.test can be run with --debug
- More comments
- Disabled not used function rpl_connect_master()
The TODO for parallel replication is documented at top of rpl_parallel.cc.
Here is the comment:
-------
- Error handling. If we fail in one of multiple parallel executions, we
need to make a best effort to complete prior transactions and roll back
following transactions, so slave binlog position will be correct.
And all the retry logic for temporary errors like deadlock.
- Stopping the slave needs to handle stopping all parallel executions. And
the logic in sql_slave_killed() that waits for current event group to
complete needs to be extended appropriately...
- Audit the use of Relay_log_info::data_lock. Make sure it is held
correctly in all needed places also when using parallel replication.
- We need some user-configurable limit on how far ahead the SQL thread will
fetch and queue events for parallel execution (otherwise if slave gets
behind we will fill up memory with pending malloc()'ed events).
- Fix update of relay-log.info and master.info. In non-GTID replication,
they must be serialised to preserve correctness. In GTID replication, we
should not update them at all except at slave thread stop.
- All the waits (eg. in struct wait_for_commit and in
rpl_parallel_thread_pool::get_thread()) need to be killable. And on kill,
everything needs to be correctly rolled back and stopped in all threads,
to ensure a consistent slave replication state.
- Handle the case of a partial event group. This occurs when the master
crashes in the middle of writing the event group to the binlog. The
slave rolls back the transaction; parallel execution needs to be able
to deal with this wrt. commit_orderer and such.
- We should notice if the master doesn't support GTID, and then run in
single threaded mode against that master. This is needed to be able to
support multi-master-replication with old and new masters.
- Retry of failed transactions is not yet implemented for the parallel case.
----------
What this means is:
In theory things should work, as long as there is no problems in the
binary or relay log and we don't fill up memory with too many events.
We will merge this branch to 10.0-base and then to 10.0 withing the
next few days and start testing it. We also plan to release a beta
of 10.0 with this code ASAP as 10.0 is now feature complete.
During the beta phase we will fix the above outstanding issues.
Kristian will this week start working on the error handling. After
that we have to look at the memory consumption (ie, not give the sql
execution threads more work than they can handle).
I will shortly update the following article with information about how
to use parallel replication.
https://mariadb.com/kb/en/parallel-replication/
The task itself is documented at:
https://mariadb.atlassian.net/browse/MDEV-4506
Regards,
Monty
1
0
[Maria-developers] Wrong SQL thread position reporting after IO thread restart
by Pavel Ivanov 11 Oct '13
by Pavel Ivanov 11 Oct '13
11 Oct '13
Krisitan,
I found what I think is a bug in IO and SQL thread accounting. How to reproduce:
1) Set up two servers S1 and S2. S1 is a master, S2 is slave with
master_using_gtid = current_pos.
2) Execute some transactions on the master, e.g.
create database d;
create table d.t (n int);
insert into d.t values (1);
3) Both servers are at 0-1-3 now, SHOW SLAVE STATUS on S2 shows
Read_Master_Log_Pos equals to Exec_Master_Log_Pos.
4) Execute STOP SLAVE IO_THREAD on S2.
5) S2 reports in the logs: "Slave I/O thread exiting, ... GTID
position 0-1-2". So IO thread didn't realize that it received full
transaction for 0-1-3 even though it didn't receive next GTID event.
6) Execute START SLAVE IO_THREAD on S2.
7) At this point SHOW SLAVE STATUS on S2 shows Read_Master_Log_Pos the
same as in step 3, but Exec_Master_Log_Pos is now less than in step 3
as if SQL thread didn't catch up with IO thread yet. But despite both
threads running and no more transactions executed on master
Exec_Master_Log_Pos doesn't change and doesn't become equal to
Read_Master_Log_Pos. This apparently happens because IO thread
restarts from one transaction behind, adds to relay log Rotate event
that master sends with the position of that transaction, but then
doesn't add any events for the transaction because it knows they
already were added into relay log.
I think both problems are bugs. And although after fixing the first it
would be really hard (if possible) to reproduce second, I'd think the
reporting of SQL thread's position still should be fixed.
Thank you,
Pavel
2
2