Hi,
Summary:
1. Three questions (are you sure, what guarantee, why)
2. A couple of suggestions (spilt a commit, move function call out of
DBUG_RETURN)
3. Two suggestions for unrelated optimizations (don't allocate unneded
bitmaps, don't modify read_set)
> commit 2734eca05d499bcbcaf428d04d4f1acbb298589f
> Author: Monty <monty(a)mariadb.org>
> Date: Mon Nov 2 11:52:06 2015 +0200
>
> diff --git a/sql/table.h b/sql/table.h
> index b150e4f..6ecfdef 100644
> --- a/sql/table.h
> +++ b/sql/table.h
> @@ -1066,9 +1066,11 @@ struct TABLE
> String alias; /* alias or table name */
> uchar *null_flags;
> MY_BITMAP def_read_set, def_write_set, def_vcol_set, tmp_set;
> + MY_BITMAP def_rpl_write_set;
I would rather not allocate more bitmaps directly in the TABLE.
They are not always needed, and only increase the TABLE size.
The same is true for def_vcol_set, in fact, it's *even more true* for vcols.
Instead I'd allocate them in the TABLE's memroot as needed.
> MY_BITMAP eq_join_set; /* used to mark equi-joined fields */
> MY_BITMAP cond_set; /* used to mark fields from sargable conditions*/
> - MY_BITMAP *read_set, *write_set, *vcol_set; /* Active column sets */
> + /* Active column sets */
> + MY_BITMAP *read_set, *write_set, *vcol_set, *rpl_write_set;
> /*
> The ID of the query that opened and is using this table. Has different
> meanings depending on the table type.
> diff --git a/sql/table.cc b/sql/table.cc
> index 76cb4b8..8816ee0 100644
> --- a/sql/table.cc
> +++ b/sql/table.cc
> @@ -2884,20 +2884,28 @@ enum open_frm_error open_table_from_share(THD *thd, TABLE_SHARE *share,
> /* Allocate bitmaps */
>
> bitmap_size= share->column_bitmap_size;
> - if (!(bitmaps= (uchar*) alloc_root(&outparam->mem_root, bitmap_size*6)))
> + if (!(bitmaps= (uchar*) alloc_root(&outparam->mem_root, bitmap_size*7)))
> goto err;
> my_bitmap_init(&outparam->def_read_set,
> (my_bitmap_map*) bitmaps, share->fields, FALSE);
> my_bitmap_init(&outparam->def_write_set,
> - (my_bitmap_map*) (bitmaps+bitmap_size), share->fields, FALSE);
> + (my_bitmap_map*) (bitmaps+bitmap_size), share->fields,
> + FALSE);
> my_bitmap_init(&outparam->def_vcol_set,
> - (my_bitmap_map*) (bitmaps+bitmap_size*2), share->fields, FALSE);
> + (my_bitmap_map*) (bitmaps+bitmap_size*2), share->fields,
> + FALSE);
This is totally wasted memory if the table has no virtual columns.
(that's what I meant by *even more true* - rpl bitmap and
eq_join_set, cond_set bitmaps may or may not be used for this table,
one never knows. But for vcols it's always known in advance whether
the table has them or not).
Still, I'd allocate other bitmaps on demand too.
> my_bitmap_init(&outparam->tmp_set,
> - (my_bitmap_map*) (bitmaps+bitmap_size*3), share->fields, FALSE);
> + (my_bitmap_map*) (bitmaps+bitmap_size*3), share->fields,
> + FALSE);
> my_bitmap_init(&outparam->eq_join_set,
> - (my_bitmap_map*) (bitmaps+bitmap_size*4), share->fields, FALSE);
> + (my_bitmap_map*) (bitmaps+bitmap_size*4), share->fields,
> + FALSE);
> my_bitmap_init(&outparam->cond_set,
> - (my_bitmap_map*) (bitmaps+bitmap_size*5), share->fields, FALSE);
> + (my_bitmap_map*) (bitmaps+bitmap_size*5), share->fields,
> + FALSE);
> + my_bitmap_init(&outparam->def_rpl_write_set,
> + (my_bitmap_map*) (bitmaps+bitmap_size*6), share->fields,
> + FALSE);
> outparam->default_column_bitmaps();
>
> outparam->cond_selectivity= 1.0;
> @@ -5986,36 +5997,48 @@ void TABLE::mark_columns_needed_for_insert()
> we only want to log a PK and we needed other fields for
> execution).
> */
> +
> void TABLE::mark_columns_per_binlog_row_image()
> {
> + THD *thd= in_use;
> DBUG_ENTER("mark_columns_per_binlog_row_image");
> DBUG_ASSERT(read_set->bitmap);
> DBUG_ASSERT(write_set->bitmap);
>
> - THD *thd= current_thd;
Hmm. Are you sure that TABLE::in_use is *always* set at this point?
You've basically replaced
- THD *thd=current_thd;
- if (in_use && in_use->is_current_stmt_binlog_format_row)
+ THD *thd=in_use;
+ if (thd->is_current_stmt_binlog_format_row)
And, btw, where's your MDEV for caching rbr checks per statement?
> + /* If not using row format */
> + rpl_write_set= write_set;
>
> /**
> If in RBR we may need to mark some extra columns,
> depending on the binlog-row-image command line argument.
> */
> if ((WSREP_EMULATE_BINLOG(thd) || mysql_bin_log.is_open()) &&
> - in_use &&
> - in_use->is_current_stmt_binlog_format_row() &&
> + thd->is_current_stmt_binlog_format_row() &&
> !ha_check_storage_engine_flag(s->db_type(), HTON_NO_BINLOG_ROW_OPT))
> {
> /* if there is no PK, then mark all columns for the BI. */
> if (s->primary_key >= MAX_KEY)
> + {
> bitmap_set_all(read_set);
> + rpl_write_set= read_set;
> + }
> + else
> + {
> switch (thd->variables.binlog_row_image) {
> case BINLOG_ROW_IMAGE_FULL:
> - if (s->primary_key < MAX_KEY)
> bitmap_set_all(read_set);
> - bitmap_set_all(write_set);
> + /* Set of columns that should be written (all) */
> + rpl_write_set= read_set;
> break;
> case BINLOG_ROW_IMAGE_NOBLOB:
> - /* for every field that is not set, mark it unless it is a blob */
> + /* Only write changed columns + not blobs */
> + rpl_write_set= &def_rpl_write_set;
> + bitmap_copy(rpl_write_set, write_set);
> +
> + /*
> + for every field that is not set, mark it unless it is a blob or
> + part of a primary key
> + */
> for (Field **ptr=field ; *ptr ; ptr++)
> {
> Field *my_field= *ptr;
> @@ -6027,23 +6050,24 @@ void TABLE::mark_columns_per_binlog_row_image()
> If set in the AI, then the blob is really needed, there is
> nothing we can do about it.
> */
> - if ((s->primary_key < MAX_KEY) &&
> - ((my_field->flags & PRI_KEY_FLAG) ||
> - (my_field->type() != MYSQL_TYPE_BLOB)))
> + if ((my_field->flags & PRI_KEY_FLAG) ||
> + (my_field->type() != MYSQL_TYPE_BLOB))
> + {
> bitmap_set_bit(read_set, my_field->field_index);
> -
> - if (my_field->type() != MYSQL_TYPE_BLOB)
> - bitmap_set_bit(write_set, my_field->field_index);
> + bitmap_set_bit(rpl_write_set, my_field->field_index);
> }
> + }
> break;
> case BINLOG_ROW_IMAGE_MINIMAL:
> - /* mark the primary key if available in the read_set */
> - if (s->primary_key < MAX_KEY)
> + /* mark the primary key */
> mark_columns_used_by_index_no_reset(s->primary_key, read_set);
> + /* Only write columns that have changed */
> + rpl_write_set= write_set;
Where's the guarantee that the primary key is in the write set?
> break;
>
> default:
> DBUG_ASSERT(FALSE);
> + }
> }
> file->column_bitmaps_signal();
> }
> diff --git a/sql/opt_range.cc b/sql/opt_range.cc
> index 860426c..6201065 100644
> --- a/sql/opt_range.cc
> +++ b/sql/opt_range.cc
> @@ -11140,26 +11138,21 @@ int QUICK_RANGE_SELECT::reset()
> int QUICK_RANGE_SELECT::get_next()
> {
> range_id_t dummy;
> + int result;
> + DBUG_ENTER("QUICK_RANGE_SELECT::get_next");
> +
> + if (!in_ror_merged_scan)
> + DBUG_RETURN(file->multi_range_read_next(&dummy));
please don't put function calls inside DBUG_RETURN,
always use a variable to store the return value first
> +
> MY_BITMAP * const save_read_set= head->read_set;
> MY_BITMAP * const save_write_set= head->write_set;
> -
> - DBUG_ENTER("QUICK_RANGE_SELECT::get_next");
> - if (in_ror_merged_scan)
> - {
> /*
> We don't need to signal the bitmap change as the bitmap is always the
> same for this head->file
> */
> head->column_bitmaps_set_no_signal(&column_bitmap, &column_bitmap);
> - }
> -
> - int result= file->multi_range_read_next(&dummy);
> + result= file->multi_range_read_next(&dummy);
> -
> - if (in_ror_merged_scan)
> - {
> - /* Restore bitmaps set on entry */
> head->column_bitmaps_set_no_signal(save_read_set, save_write_set);
> - }
> DBUG_RETURN(result);
> }
Ok, all changes in opt_range.cc are unrelated to your rpl_write_set
patch, please commit them separately
> diff --git a/sql/rpl_record.cc b/sql/rpl_record.cc
> index ee222f0..208b2b6 100644
> --- a/sql/rpl_record.cc
> +++ b/sql/rpl_record.cc
> @@ -434,7 +433,7 @@ unpack_row(rpl_group_info *rgi,
> *master_reclength = table->s->reclength;
> }
>
> - DBUG_RETURN(error);
> + DBUG_RETURN(0);
> }
>
> /**
also unrelated to rpl_write_set. If you'd like you can put it in the same
commit with opt_range.cc changes, with the sybject "cleanups" or something
> diff --git a/sql/sql_class.cc b/sql/sql_class.cc
> index 5054357..b238e89 100644
> --- a/sql/sql_class.cc
> +++ b/sql/sql_class.cc
> @@ -6259,11 +6248,11 @@ int THD::binlog_delete_row(TABLE* table, bool is_trans,
> DBUG_ASSERT(is_current_stmt_binlog_format_row() &&
> ((WSREP(this) && wsrep_emulate_bin_log) || mysql_bin_log.is_open()));
> /**
> - Save a reference to the original read and write set bitmaps.
> - We will need this to restore the bitmaps at the end.
> + Save a reference to the original read bitmaps
> + We will need this to restore the bitmaps at the end as
> + binlog_prepare_row_images() may changetable->read_set.
> */
This doesn't make much sense. Here, the code is like that
THD::binlog_delete_row:
save table->read_set
THD::binlog_prepare_row_images
calculate table->tmp_set
table->read_set=table->tmp_set
pack_row(table->read_set)
for every set bit in the set_arg call field->pack()
restore table->read_set
So, two thoughts here:
1. tmp_set must be a strict subset of read_set!
other fields may not be present in the record[0]
and there're no row reads between binlog_prepare_row_images and field->pack
2. One can pass table->tmp_set directly into pack_row() as an argument
which means there's no need for binlog_prepare_row_images to modify
table->read_set at all
> MY_BITMAP *old_read_set= table->read_set;
> - MY_BITMAP *old_write_set= table->write_set;
>
> /**
> This will remove spurious fields required during execution but
> diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc
> index 488826a..0661e70 100644
> --- a/sql/sql_insert.cc
> +++ b/sql/sql_insert.cc
> @@ -2867,5 +2867,7 @@ pthread_handler_t handle_delayed_insert(void *arg)
> /* Tell client that the thread is initialized */
> mysql_cond_signal(&di->cond_client);
>
> + di->table->mark_columns_needed_for_insert();
why?
> +
> /* Now wait until we get an insert or lock to handle */
> /* We will not abort as long as a client thread uses this thread */
Regards,
Sergei