Hi Varun, After this patch, the call contract of init_read_record(READ_RECORD *info, ...) function call is changed. all members of *info are ignored, except for READ_RECORD::need_unpacking which affects how init_read_record() works. Most of the callers don't follow the contract, btw. For example, mysql_delete does this: READ_RECORD info; ... error= init_read_record(&info, thd, table, select, file_sort, 1, 1, FALSE); . I would suggest the following change: - leave init_read_record() as is. - move this logic:
+ if (need_unpacking) + { + info->read_record_func_and_unpack_calls= info->read_record_func; + info->read_record_func = rr_read_record_and_unpack; + }
from init_read_record() into join_init_read_record(). Note that after that, READ_RECORD::need_unpacking member is not necessary anymore. READ_RECORD::read_record_func_and_unpack_calls is still needed (and fine to have), but it needs a better name. Does read_record_func_for_rr_and_unpack sound better ? The current code already does unpacking. The read structure is initialized in join_read_record_no_init(). Can these two places do the initialization in the same way? The second question is that I doubt why is this initalization invoked from two different places. The second big question: the code in filesort does not seem to do unpacking! find_all_keys() either uses a quick select or uses this call: error= file->ha_rnd_next(sort_form->record[0]); It doesn't call unpack functions and doesn't have a setting to do so. I'm debugging the example from the bug report, and when execution reaches here: #0 Field::make_sort_key (this=0x7ffe5409c1f8, buff=0x7ffe540aa775 '\245' <repeats 131 times>, "h4z\025", '\217' <repeats 12 times>, "\221", length=3) at /home/psergey/dev-git/10.3-review/sql/field.cc:1005 #1 0x0000555555e74e90 in make_sortkey (param=0x7fffd8643080, to=0x7ffe540aa774 "\001", '\245' <repeats 131 times>, "h4z\025", '\217' <repeats 12 times>, "\221", ref_pos=0x7ffe540a9cd0 "\360\315\nT\376\177") at /home/psergey/dev-git/10.3-review/sql/filesort.cc:1180 #2 0x0000555555e74035 in find_all_keys (thd=0x7ffe54000e00, param=0x7fffd8643080, select=0x0, fs_info=0x7ffe540aa480, buffpek_pointers=0x7fffd8643280, tempfile=0x7fffd8643110, pq=0x0, found_rows=0x7ffe540aa660) at /home/psergey/dev-git/10.3-review/sql/filesort.cc:860 #3 0x0000555555e72235 in filesort (thd=0x7ffe54000e00, table=0x7ffe540a87d8, filesort=0x7ffe540a4cb0, tracker=0x7ffe540a5a88, join=0x7ffe54017a30, first_table_bit=4) at /home/psergey/dev-git/10.3-review/sql/filesort.cc:270 it constructs the same key value every time, which - cannot be true (no duplicates after materialization) - matches what one can expect when "unpacking" didn't work. On Fri, Jan 05, 2018 at 04:59:45AM +0530, Varun wrote:
revision-id: 4720db9441fe9af7a69b59f2b097039ba256741e (mariadb-10.3.0-348-g4720db9441f) parent(s): c9ad134e56cc70119aaab8c8ac60c837fbb98dac author: Varun Gupta committer: Varun Gupta timestamp: 2018-01-05 04:53:54 +0530 message:
MDEV-13694: Wrong result upon GROUP BY with orderby_uses_equalities=on
When we have the optimization orderby_uses_equalities ON then for semi-join materialised tables we get wrong results when these table is involved in filesort. This is due to the reason that the fields that are referenced are the ones from the semi-join materialised table and not from the original table
The fix is to have the fields refer to the original table. So a copy_back technique is implemented for rr_sequential functions, which reads data from semi-join materialised tables and copies that back to the original table fields. So we need to extend this technique to other rr_* functions which are used to read the sorted data after filesort is performed
--- mysql-test/r/order_by.result | 51 ++++++++++++++++++++++++++++++++++++- mysql-test/r/order_by_innodb.result | 16 ++++++------ mysql-test/t/order_by.test | 47 ++++++++++++++++++++++++++++++++++ sql/records.cc | 20 +++++++++++++++ sql/records.h | 5 ++-- sql/sql_select.cc | 46 ++++++++++++++++++--------------- 6 files changed, 154 insertions(+), 31 deletions(-)
diff --git a/mysql-test/r/order_by.result b/mysql-test/r/order_by.result index 946ecb51426..b45eb6481a0 100644 --- a/mysql-test/r/order_by.result +++ b/mysql-test/r/order_by.result @@ -3200,10 +3200,59 @@ WHERE books.library_id = 8663 AND books.scheduled_for_removal=0 ) ORDER BY wings.id; id select_type table type possible_keys key key_len ref rows filtered Extra -1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 1 100.00 Using temporary; Using filesort +1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 1 100.00 Using filesort 1 PRIMARY wings eq_ref PRIMARY PRIMARY 4 test.books.wings_id 1 100.00 2 MATERIALIZED books ref library_idx library_idx 4 const 1 100.00 Using where Warnings: Note 1003 select `test`.`wings`.`id` AS `wing_id`,`test`.`wings`.`department_id` AS `department_id` from `test`.`wings` semi join (`test`.`books`) where `test`.`books`.`library_id` = 8663 and `test`.`books`.`scheduled_for_removal` = 0 and `test`.`wings`.`id` = `test`.`books`.`wings_id` order by `test`.`wings`.`id` set optimizer_switch= @save_optimizer_switch; DROP TABLE books, wings; +# +# Wrong result upon GROUP BY with orderby_uses_equalities=on +# Please use names t1,t2,t3, as [nearly] all tests do. Please also add the standard header:
--echo # --echo # MDEV-nnnn: .... --echo #
+CREATE TABLE person ( +PersonID MEDIUMINT(8) UNSIGNED AUTO_INCREMENT, +PRIMARY KEY (PersonID) +) ; +CREATE TABLE percat ( ...
diff --git a/sql/records.cc b/sql/records.cc index 650a51f7f37..f29554c59ef 100644 --- a/sql/records.cc +++ b/sql/records.cc @@ -49,6 +49,18 @@ static int rr_index_last(READ_RECORD *info); static int rr_index(READ_RECORD *info); static int rr_index_desc(READ_RECORD *info);
+static int rr_read_record_and_unpack(READ_RECORD *info) +{ + int error; + if ((error= info->read_record_func_and_unpack_calls(info))) + return error; + + for (Copy_field *cp= info->copy_field; cp != info->copy_field_end; cp++) + (*cp->do_copy)(cp); + + return error; +}
The existing code has rr_sequential_and_unpack function. AFAIU this function is a more general solution, so that function can be removed.
+
/** Initialize READ_RECORD structure to perform full index scan in desired @@ -189,6 +201,7 @@ bool init_read_record(READ_RECORD *info,THD *thd, TABLE *table, { IO_CACHE *tempfile; SORT_ADDON_FIELD *addon_field= filesort ? filesort->addon_field : 0; + bool need_unpacking= info->need_unpacking; DBUG_ENTER("init_read_record");
bzero((char*) info,sizeof(*info)); @@ -308,6 +321,13 @@ bool init_read_record(READ_RECORD *info,THD *thd, TABLE *table, (void) table->file->extra_opt(HA_EXTRA_CACHE, thd->variables.read_buff_size); } + + if (need_unpacking) + { + info->read_record_func_and_unpack_calls= info->read_record_func; + info->read_record_func = rr_read_record_and_unpack; + } + /* Condition pushdown to storage engine */ if ((table->file->ha_table_flags() & HA_CAN_TABLE_CONDITION_PUSHDOWN) && select && select->cond && diff --git a/sql/records.h b/sql/records.h index 940c88ca0c7..ee923297ef9 100644 --- a/sql/records.h +++ b/sql/records.h @@ -56,6 +56,7 @@ struct READ_RECORD TABLE **forms; /* head and ref forms */ Unlock_row_func unlock_row; Read_func read_record_func; + Read_func read_record_func_and_unpack_calls; THD *thd; SQL_SELECT *select; uint cache_records; @@ -67,7 +68,7 @@ struct READ_RECORD uchar *cache,*cache_pos,*cache_end,*read_positions; struct st_sort_addon_field *addon_field; /* Pointer to the fields info */ struct st_io_cache *io_cache; - bool print_error, ignore_not_found_rows; + bool print_error, ignore_not_found_rows, need_unpacking; void (*unpack)(struct st_sort_addon_field *, uchar *, uchar *);
int read_record() { return read_record_func(this); } @@ -79,7 +80,7 @@ struct READ_RECORD Copy_field *copy_field; Copy_field *copy_field_end; public: - READ_RECORD() : table(NULL), cache(NULL) {} + READ_RECORD() : table(NULL), cache(NULL), need_unpacking(false) {} ~READ_RECORD() { end_read_record(this); } };
diff --git a/sql/sql_select.cc b/sql/sql_select.cc index 8aed093af7f..98734715efe 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -12696,20 +12696,7 @@ remove_const(JOIN *join,ORDER *first_order, COND *cond, can be used without tmp. table. */ bool can_subst_to_first_table= false; - bool first_is_in_sjm_nest= false; - if (first_is_base_table) - { - TABLE_LIST *tbl_for_first= - join->join_tab[join->const_tables].table->pos_in_table_list; - first_is_in_sjm_nest= tbl_for_first->sj_mat_info && - tbl_for_first->sj_mat_info->is_used; - } /* - Currently we do not employ the optimization that uses multiple - equalities for ORDER BY to remove tmp table in the case when - the first table happens to be the result of materialization of - a semi-join nest ( <=> first_is_in_sjm_nest == true). - When a semi-join nest is materialized and scanned to look for possible matches in the remaining tables for every its row the fields from the result of materialization are copied @@ -12717,16 +12704,12 @@ remove_const(JOIN *join,ORDER *first_order, COND *cond, So these copies are used to access the remaining tables rather than the fields from the result of materialization.
- Unfortunately now this so-called 'copy back' technique is - supported only if the rows are scanned with the rr_sequential - function, but not with other rr_* functions that are employed + Now this so-called 'copy back' technique is + supported for all rr_* functions that are employed when the result of materialization is required to be sorted. - - TODO: either to support 'copy back' technique for the above case, - or to get rid of this technique altogether. */ if (optimizer_flag(join->thd, OPTIMIZER_SWITCH_ORDERBY_EQ_PROP) && - first_is_base_table && !first_is_in_sjm_nest && + first_is_base_table && order->item[0]->real_item()->type() == Item::FIELD_ITEM && join->cond_equal) { @@ -19693,6 +19676,8 @@ bool test_if_use_dynamic_range_scan(JOIN_TAB *join_tab)
int join_init_read_record(JOIN_TAB *tab) { + Copy_field *save_copy, *save_copy_end; + TABLE_LIST *tbl_for_first; /* Note: the query plan tree for the below operations is constructed in save_agg_explain_data. @@ -19716,9 +19701,30 @@ int join_init_read_record(JOIN_TAB *tab) tab->join->thd->reset_killed();); if (!tab->preread_init_done && tab->preread_init()) return 1; + + /* + Allow unpacking of field for semi-join materialised table when + they are involved in filesort + */ + + tbl_for_first= tab->table->pos_in_table_list; + tab->read_record.need_unpacking= tbl_for_first ? (tab->filesort && + tbl_for_first->sj_mat_info && + tbl_for_first->sj_mat_info->is_used) : FALSE; + + /* + init_read_record resets all elements of tab->read_record(). + Remember things that we don't want to have reset. + */ + save_copy= tab->read_record.copy_field; + save_copy_end= tab->read_record.copy_field_end; + if (init_read_record(&tab->read_record, tab->join->thd, tab->table, tab->select, tab->filesort_result, 1,1, FALSE)) return 1; + tab->read_record.copy_field= save_copy; + tab->read_record.copy_field_end= save_copy_end; + return tab->read_record.read_record(); }
BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog