Re: [Maria-developers] 52f489ebccb: MDEV-29069 follow-up: support partially suitable keys
Hi, Nikita, This review applies to the combined diff e2f8dff^..52f489e On Oct 04, Nikita Malyavin wrote:
diff --git a/sql/log_event.h b/sql/log_event.h index 91a9eeee10f..8452e665c98 100644 --- a/sql/log_event.h +++ b/sql/log_event.h @@ -5275,8 +5278,11 @@ class Rows_log_event : public Log_event uchar *m_key; /* Buffer to keep key value during searches */ KEY *m_key_info; /* Pointer to KEY info for m_key_nr */ uint m_key_nr; /* Key number */ + uint m_key_parts_suit; /* A number of key_parts suited to lookup */
that's very confusing name. Took me a while to understand what you meant. Better call it m_usable_key_parts
bool master_had_triggers; /* set after tables opening */
+ uint key_parts_suit_event(const KEY *key) const;
and this could be "get_usable_key_parts()"
+ int find_key(); // Find a best key to use in find_row() int find_row(rpl_group_info *); int write_row(rpl_group_info *, const bool); diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 422d496d5c3..25705d13641 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -6062,7 +6065,7 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) */ sql_mode_t saved_sql_mode= thd->variables.sql_mode; if (!is_auto_inc_in_extra_columns()) - thd->variables.sql_mode= (rpl_data.is_online_alter() ? saved_sql_mode :0) + thd->variables.sql_mode= (m_online_alter ? saved_sql_mode :0)
wouldn't it be more correct to use saved_sql_mode also in normal replication?
| MODE_NO_AUTO_VALUE_ON_ZERO;
// row processing loop @@ -7949,14 +7952,35 @@ uint8 Write_rows_log_event::get_trg_event_map() **************************************************************************/
#if defined(HAVE_REPLICATION) -/* - Compares table->record[0] and table->record[1] +/** + @brief Compares table->record[0] and table->record[1] + + @param masrter_columns a number of columns on the source replica, + 0 if ONLINE ALTER TABLE
- Returns TRUE if different. + @returns true if different. */ -static bool record_compare(TABLE *table) +static bool record_compare(TABLE *table, uint master_columns) { - bool result= FALSE; + bool result= false; + + /* + Determine if the optimized check is possible (and we can + goto record_compare_exit). + We should ensure that all extra columns (i.e. fieldnr > master_columns) + have explicit values. If not, we will exclude them from comparison, + as they can contain non-deterministic values. + + master_columns == 0 case is optimization for ONLINE ALTER to check + all columns fast. + */ + bool all_values_set= master_columns == 0 + && bitmap_is_set_all(&table->has_value_set); + for (uint i = master_columns; all_values_set && i < table->s->fields; i++) + { + all_values_set= bitmap_is_set(&table->has_value_set, i); + }
wait, this doesn't make any sense. You only do the loop if all_values_set is true. But it is true only if bitmap_is_set_all(&table->has_value_set). There is no point in doing bitmap_is_set() for individual bits, when you already know that all bits are set :) shouldn't it simply be bool all_values_set= bitmap_is_set_all(&table->has_value_set); ? but see below, perhaps you don't need all_values_set at all
+ /** Compare full record only if: - there are no blob fields (otherwise we would also need @@ -7969,40 +7993,43 @@ static bool record_compare(TABLE *table) */ if ((table->s->blob_fields + table->s->varchar_fields + - table->s->null_fields) == 0) + table->s->null_fields) == 0 + && all_values_set) { result= cmp_record(table,record[1]); goto record_compare_exit; }
/* Compare null bits */ - if (memcmp(table->null_flags, - table->null_flags+table->s->rec_buff_length, - table->s->null_bytes)) + if (all_values_set && memcmp(table->null_flags, + table->null_flags+table->s->rec_buff_length, + table->s->null_bytes)) { - result= TRUE; // Diff in NULL value + result= true; // Diff in NULL value goto record_compare_exit; }
/* Compare fields */ for (Field **ptr=table->field ; *ptr ; ptr++) { + Field *f= *ptr; if (table->versioned() && (*ptr)->vers_sys_field()) { continue; } - /** - We only compare field contents that are not null. - NULL fields (i.e., their null bits) were compared - earlier. - */ - if (!(*(ptr))->is_null()) + /* + We only compare fields that exist on the source (or in ONLINE + ALTER case, that were in the original table). For reference, + replica tables can also contain extra fields. + */ + if (f->field_index > master_columns && !f->has_explicit_value()) + continue;
same question as in a previous review. Do you mean that when f->field_index < master_columns, it's ok to compare values below even when !f->has_explicit_value() ?
+ + if (f->is_null() != f->is_null(table->s->rec_buff_length) + || f->cmp_binary_offset(table->s->rec_buff_length))
may be it's better to create a fast copy of this function, like if (bitmap_is_set_all(&table->has_value_set)) { ... old record_compare() code starting from if ((table->s->blob_fields + table->s->varchar_fields + table->s->null_fields) == 0) } else { ... your code, only the loop over fields. } in this case you won't even need all_values_set or master_columns
{ - if ((*ptr)->cmp_binary_offset(table->s->rec_buff_length)) - { - result= TRUE; - goto record_compare_exit; - } + result= true; + goto record_compare_exit; } }
@@ -8010,6 +8037,32 @@ static bool record_compare(TABLE *table) return result; }
+/** + Newly added fields with non-deterministic defaults (i.e. DEFAULT(RANDOM()), + CURRENT_TIMESTAMP, AUTO_INCREMENT) should be excluded from key search. + Basically we exclude all the default-filled fields based on + has_explicit_value bitmap. + */ +uint Rows_log_event::key_parts_suit_event(const KEY *key) const +{ + uint master_columns= m_online_alter ? 0 : m_cols.n_bits;
get_master_cols()
+ uint p= 0; + for (;p < key->ext_key_parts; p++) + { + Field *f= key->key_part[p].field; + uint non_deterministic_default= f->default_value && + f->default_value->flags | VCOL_NOT_STRICTLY_DETERMINISTIC; + + int next_number_field= f->table->next_number_field ? + f->table->next_number_field->field_index : -1;
eh? 1. why it's *in the loop* ? 2. why not simply || f == f->table->next_number_field or, to be extra safe || f->table->field[f->field_index] == f->table->next_number_field
+ + if (f->field_index >= master_columns
I don't think this condition ^^^ is needed. Same as elsewhere, if f->field_index < master_columns but !bitmap_is_set, does it mean the keyseg is ok? I don't think so.
+ && !bitmap_is_set(&m_table->has_value_set, f->field_index) + && (non_deterministic_default || next_number_field == f->field_index)) + break; + } + return p; +}
/** Find the best key to use when locating the row in @c find_row(). @@ -8024,13 +8077,26 @@ static bool record_compare(TABLE *table) */ int Rows_log_event::find_key() { - uint i, best_key_nr, last_part; + uint i, best_key_nr; KEY *key, *UNINIT_VAR(best_key); ulong UNINIT_VAR(best_rec_per_key), tmp; DBUG_ENTER("Rows_log_event::find_key"); DBUG_ASSERT(m_table);
+ #ifdef DBUG_TRACE
#ifndef DBUG_OFF
+ auto _ = make_scope_exit([this]() { + DBUG_EXECUTE_IF("rpl_report_chosen_key", + push_warning_printf(m_table->in_use, + Sql_condition::WARN_LEVEL_NOTE, + ER_UNKNOWN_ERROR, "Key chosen: %d", + m_key_nr == MAX_KEY ? + -1 : m_key_nr);); + }); + #endif + + m_key_nr= MAX_KEY; best_key_nr= MAX_KEY; + uint parts_suit= 0;
/* Keys are sorted so that any primary key is first, followed by unique keys, @@ -8041,33 +8107,38 @@ int Rows_log_event::find_key() { if (!m_table->s->keys_in_use.is_set(i)) continue; + parts_suit= key_parts_suit_event(key); + if (!parts_suit) + continue; /* We cannot use a unique key with NULL-able columns to uniquely identify a row (but we can still select it for range scan below if nothing better is available). */ - if ((key->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME) + if ((key->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME && + parts_suit == key->user_defined_key_parts)
this doesn't make use of ext key parts. Should be something like if (parts_suit == key->ext_key_parts || (parts_suit >= key->user_defined_key_parts && (key->flags & ....)))
{ best_key_nr= i; best_key= key; + m_key_parts_suit= parts_suit; break; } /* We can only use a non-unique key if it allows range scans (ie. skip FULLTEXT indexes and such). */ - last_part= key->user_defined_key_parts - 1; DBUG_PRINT("info", ("Index %s rec_per_key[%u]= %lu", - key->name.str, last_part, key->rec_per_key[last_part])); - if (!(m_table->file->index_flags(i, last_part, 1) & HA_READ_NEXT)) + key->name.str, parts_suit, key->rec_per_key[parts_suit])); + if (!(m_table->file->index_flags(i, parts_suit, 1) & HA_READ_NEXT)) continue;
- tmp= key->rec_per_key[last_part]; + tmp= key->rec_per_key[parts_suit - 1]; if (best_key_nr == MAX_KEY || (tmp > 0 && tmp < best_rec_per_key)) { best_key_nr= i; best_key= key; best_rec_per_key= tmp; + m_key_parts_suit= parts_suit; } }
@@ -8494,12 +8568,20 @@ Delete_rows_log_event::do_before_row_operations(const Slave_reporting_capability if (get_flags(STMT_END_F)) status_var_increment(thd->status_var.com_stat[SQLCOM_DELETE]);
+ const uchar *curr_row_end= m_curr_row_end; + unpack_row(rgi, m_table, m_width, m_curr_row, &m_cols, + &curr_row_end, &m_master_reclength, m_rows_end); + + KEY *pk_info= m_table->key_info + m_table->s->primary_key; if ((m_table->file->ha_table_flags() & HA_PRIMARY_KEY_REQUIRED_FOR_POSITION) && - m_table->s->primary_key < MAX_KEY) + m_table->s->primary_key < MAX_KEY && + key_parts_suit_event(pk_info) == pk_info->ext_key_parts) { /* We don't need to allocate any memory for m_key since it is not used. */ + m_key_nr= m_table->s->primary_key; + m_key_parts_suit= pk_info->ext_key_parts; return 0; } if (do_invoke_trigger())
already commented in the previous review that if "We don't need to allocate any memory" than you `return 0` early wthout doing do_invoke_trigger(). Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hello Sergei! On Tue, 4 Oct 2022 at 21:02, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nikita,
This review applies to the combined diff e2f8dff^..52f489e
On Oct 04, Nikita Malyavin wrote:
diff --git a/sql/log_event.h b/sql/log_event.h index 91a9eeee10f..8452e665c98 100644 --- a/sql/log_event.h +++ b/sql/log_event.h @@ -5275,8 +5278,11 @@ class Rows_log_event : public Log_event uchar *m_key; /* Buffer to keep key value during searches */ KEY *m_key_info; /* Pointer to KEY info for m_key_nr */ uint m_key_nr; /* Key number */ + uint m_key_parts_suit; /* A number of key_parts suited to lookup */
that's very confusing name. Took me a while to understand what you meant. Better call it m_usable_key_parts
bool master_had_triggers; /* set after tables opening */
+ uint key_parts_suit_event(const KEY *key) const;
and this could be "get_usable_key_parts()"
Okay, no problem. Both sound suitable, or usable, for me:)
+ int find_key(); // Find a best key to use in find_row() int find_row(rpl_group_info *); int write_row(rpl_group_info *, const bool); diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 422d496d5c3..25705d13641 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -6062,7 +6065,7 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) */ sql_mode_t saved_sql_mode= thd->variables.sql_mode; if (!is_auto_inc_in_extra_columns()) - thd->variables.sql_mode= (rpl_data.is_online_alter() ? saved_sql_mode :0) + thd->variables.sql_mode= (m_online_alter ? saved_sql_mode :0)
wouldn't it be more correct to use saved_sql_mode also in normal replication?
using MODE_NO_AUTO_VALUE_ON_ZERO was a fix to Bug #56662 Assertion failed: next_insert_id == 0, file .\handler.cc. See commit d5bf6b8aa8bb
| MODE_NO_AUTO_VALUE_ON_ZERO;
// row processing loop @@ -7949,14 +7952,35 @@ uint8 Write_rows_log_event::get_trg_event_map()
**************************************************************************/
#if defined(HAVE_REPLICATION) -/* - Compares table->record[0] and table->record[1] +/** + @brief Compares table->record[0] and table->record[1] + + @param masrter_columns a number of columns on the source replica, + 0 if ONLINE ALTER TABLE
- Returns TRUE if different. + @returns true if different. */ -static bool record_compare(TABLE *table) +static bool record_compare(TABLE *table, uint master_columns) { - bool result= FALSE; + bool result= false; + + /* + Determine if the optimized check is possible (and we can + goto record_compare_exit). + We should ensure that all extra columns (i.e. fieldnr >
master_columns)
+ have explicit values. If not, we will exclude them from comparison, + as they can contain non-deterministic values. + + master_columns == 0 case is optimization for ONLINE ALTER to check + all columns fast. + */ + bool all_values_set= master_columns == 0 + && bitmap_is_set_all(&table->has_value_set); + for (uint i = master_columns; all_values_set && i < table->s->fields; i++) + { + all_values_set= bitmap_is_set(&table->has_value_set, i); + }
wait, this doesn't make any sense. You only do the loop if all_values_set is true. But it is true only if bitmap_is_set_all(&table->has_value_set). There is no point in doing bitmap_is_set() for individual bits, when you already know that all bits are set :)
shouldn't it simply be
bool all_values_set= bitmap_is_set_all(&table->has_value_set);
? but see below, perhaps you don't need all_values_set at all
Yes, you are correct here. We've been throught this on a previous e-mail, it should have been fixed already to the moment of your review. Perhaps, some part is not included here.
/** Compare full record only if: - there are no blob fields (otherwise we would also need @@ -7969,40 +7993,43 @@ static bool record_compare(TABLE *table) */ if ((table->s->blob_fields + table->s->varchar_fields + - table->s->null_fields) == 0) + table->s->null_fields) == 0 + && all_values_set) { result= cmp_record(table,record[1]); goto record_compare_exit; }
/* Compare null bits */ - if (memcmp(table->null_flags, - table->null_flags+table->s->rec_buff_length, - table->s->null_bytes)) + if (all_values_set && memcmp(table->null_flags, +
+ table->null_flags+table->s->rec_buff_length,
+ table->s->null_bytes)) { - result= TRUE; // Diff in NULL value + result= true; // Diff in NULL value goto record_compare_exit; }
/* Compare fields */ for (Field **ptr=table->field ; *ptr ; ptr++) { + Field *f= *ptr; if (table->versioned() && (*ptr)->vers_sys_field()) { continue; } - /** - We only compare field contents that are not null. - NULL fields (i.e., their null bits) were compared - earlier. - */ - if (!(*(ptr))->is_null()) + /* + We only compare fields that exist on the source (or in ONLINE + ALTER case, that were in the original table). For reference, + replica tables can also contain extra fields. + */ + if (f->field_index > master_columns && !f->has_explicit_value()) + continue;
same question as in a previous review. Do you mean that when f->field_index < master_columns, it's ok to compare values below even when !f->has_explicit_value() ?
Note, however, that it is an important condition in find_key(). See the commits of the next review:)
+ + if (f->is_null() != f->is_null(table->s->rec_buff_length) + || f->cmp_binary_offset(table->s->rec_buff_length))
may be it's better to create a fast copy of this function, like
if (bitmap_is_set_all(&table->has_value_set)) { ... old record_compare() code starting from if ((table->s->blob_fields + table->s->varchar_fields + table->s->null_fields) == 0) } else { ... your code, only the loop over fields. }
in this case you won't even need all_values_set or master_columns
I don't see a demand on refactoring this. The function looks readable enough for me, and it's less than two screens, while there are functions begging to be refactored by someone:) I guess you might not like all_values_set checked two times in a row... Well, in this particular case i'm sure it will be perfectly optimized out.
{ - if ((*ptr)->cmp_binary_offset(table->s->rec_buff_length)) - { - result= TRUE; - goto record_compare_exit; - } + result= true; + goto record_compare_exit; } }
@@ -8010,6 +8037,32 @@ static bool record_compare(TABLE *table) return result; }
+/** + Newly added fields with non-deterministic defaults (i.e.
DEFAULT(RANDOM()),
+ CURRENT_TIMESTAMP, AUTO_INCREMENT) should be excluded from key search. + Basically we exclude all the default-filled fields based on + has_explicit_value bitmap. + */ +uint Rows_log_event::key_parts_suit_event(const KEY *key) const +{ + uint master_columns= m_online_alter ? 0 : m_cols.n_bits;
get_master_cols()
👍
+ uint p= 0; + for (;p < key->ext_key_parts; p++) + { + Field *f= key->key_part[p].field; + uint non_deterministic_default= f->default_value && + f->default_value->flags | VCOL_NOT_STRICTLY_DETERMINISTIC; + + int next_number_field= f->table->next_number_field ? + f->table->next_number_field->field_index : -1;
eh? 1. why it's *in the loop* ? 2. why not simply
|| f == f->table->next_number_field
or, to be extra safe
|| f->table->field[f->field_index] == f->table->next_number_field
Thanks! Turns out there's no need in extra safety, it just works.
+ + if (f->field_index >= master_columns
I don't think this condition ^^^ is needed. Same as elsewhere, if f->field_index < master_columns but !bitmap_is_set, does it mean the keyseg is ok? I don't think so.
It *is* needed. An extra field with default can be used in the key lookup. A usual field should have an explicit value. If it's not, it means it's not sent (MINIMAL/NOBLOB) and we don't know what real value is
+ && !bitmap_is_set(&m_table->has_value_set, f->field_index) + && (non_deterministic_default || next_number_field == f->field_index)) + break; + } + return p; +}
/** Find the best key to use when locating the row in @c find_row(). @@ -8024,13 +8077,26 @@ static bool record_compare(TABLE *table) */ int Rows_log_event::find_key() { - uint i, best_key_nr, last_part; + uint i, best_key_nr; KEY *key, *UNINIT_VAR(best_key); ulong UNINIT_VAR(best_rec_per_key), tmp; DBUG_ENTER("Rows_log_event::find_key"); DBUG_ASSERT(m_table);
+ #ifdef DBUG_TRACE
#ifndef DBUG_OFF
Pity we have a double negation in similar places, I tried to avoid it and came across that macro
+ auto _ = make_scope_exit([this]() { + DBUG_EXECUTE_IF("rpl_report_chosen_key", + push_warning_printf(m_table->in_use, + Sql_condition::WARN_LEVEL_NOTE, + ER_UNKNOWN_ERROR, "Key chosen: %d", + m_key_nr == MAX_KEY ? + -1 : m_key_nr);); + }); + #endif + + m_key_nr= MAX_KEY; best_key_nr= MAX_KEY; + uint parts_suit= 0;
/* Keys are sorted so that any primary key is first, followed by unique keys, @@ -8041,33 +8107,38 @@ int Rows_log_event::find_key() { if (!m_table->s->keys_in_use.is_set(i)) continue; + parts_suit= key_parts_suit_event(key); + if (!parts_suit) + continue; /* We cannot use a unique key with NULL-able columns to uniquely identify a row (but we can still select it for range scan below if nothing better is available). */ - if ((key->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME) + if ((key->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME && + parts_suit == key->user_defined_key_parts)
this doesn't make use of ext key parts. Should be something like
why not just use ext_key_parts (which I'm using now btw)?
if (parts_suit == key->ext_key_parts ||
(parts_suit >= key->user_defined_key_parts && (key->flags & ....)))
{ best_key_nr= i; best_key= key; + m_key_parts_suit= parts_suit; break; } /* We can only use a non-unique key if it allows range scans (ie.
skip
FULLTEXT indexes and such). */ - last_part= key->user_defined_key_parts - 1; DBUG_PRINT("info", ("Index %s rec_per_key[%u]= %lu", - key->name.str, last_part,
key->rec_per_key[last_part]));
- if (!(m_table->file->index_flags(i, last_part, 1) & HA_READ_NEXT)) + key->name.str, parts_suit, key->rec_per_key[parts_suit])); + if (!(m_table->file->index_flags(i, parts_suit, 1) & HA_READ_NEXT)) continue;
- tmp= key->rec_per_key[last_part]; + tmp= key->rec_per_key[parts_suit - 1]; if (best_key_nr == MAX_KEY || (tmp > 0 && tmp < best_rec_per_key)) { best_key_nr= i; best_key= key; best_rec_per_key= tmp; + m_key_parts_suit= parts_suit; } }
-- Yours truly, Nikita Malyavin
Hi, Nikita, On Oct 17, Nikita Malyavin wrote:
Hello Sergei!
On Tue, 4 Oct 2022 at 21:02, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nikita,
This review applies to the combined diff e2f8dff^..52f489e
On Oct 04, Nikita Malyavin wrote:
diff --git a/sql/log_event.h b/sql/log_event.h index 91a9eeee10f..8452e665c98 100644 --- a/sql/log_event.h +++ b/sql/log_event.h @@ -5275,8 +5278,11 @@ class Rows_log_event : public Log_event uchar *m_key; /* Buffer to keep key value during searches */ KEY *m_key_info; /* Pointer to KEY info for m_key_nr */ uint m_key_nr; /* Key number */ + uint m_key_parts_suit; /* A number of key_parts suited to lookup */
that's very confusing name. Took me a while to understand what you meant. Better call it m_usable_key_parts
bool master_had_triggers; /* set after tables opening */
+ uint key_parts_suit_event(const KEY *key) const;
and this could be "get_usable_key_parts()"
Okay, no problem. Both sound suitable, or usable, for me:)
and is similar to KEY::usable_key_parts
diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 422d496d5c3..25705d13641 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -6062,7 +6065,7 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) */ sql_mode_t saved_sql_mode= thd->variables.sql_mode; if (!is_auto_inc_in_extra_columns()) - thd->variables.sql_mode= (rpl_data.is_online_alter() ? saved_sql_mode :0) + thd->variables.sql_mode= (m_online_alter ? saved_sql_mode :0) | MODE_NO_AUTO_VALUE_ON_ZERO;
wouldn't it be more correct to use saved_sql_mode also in normal replication?
using MODE_NO_AUTO_VALUE_ON_ZERO was a fix to Bug #56662 Assertion failed: next_insert_id == 0, file .\handler.cc. See commit d5bf6b8aa8bb
No, I didn't mean to ask "why MODE_NO_AUTO_VALUE_ON_ZERO", I meant to ask why not thd->variables.sql_mode= saved_sql_mode | MODE_NO_AUTO_VALUE_ON_ZERO; that is why not "to use saved_sql_mode also in normal replication"
+ uint p= 0; + for (;p < key->ext_key_parts; p++) + { + Field *f= key->key_part[p].field; + uint non_deterministic_default= f->default_value && + f->default_value->flags | VCOL_NOT_STRICTLY_DETERMINISTIC; + + int next_number_field= f->table->next_number_field ? + f->table->next_number_field->field_index : -1;
eh? 1. why it's *in the loop* ? 2. why not simply
|| f == f->table->next_number_field
or, to be extra safe
|| f->table->field[f->field_index] == f->table->next_number_field
Thanks! Turns out there's no need in extra safety, it just works.
by "extra safety" I meant the case when f != f->table->field[f->field_index] (here f= key->key_part[p].field). This is the case for varchar/blob prefix keys. But as they cannot be auto-increment anyway, the direct comparison should work, I suppose.
+ + if (f->field_index >= master_columns
I don't think this condition ^^^ is needed. Same as elsewhere, if f->field_index < master_columns but !bitmap_is_set, does it mean the keyseg is ok? I don't think so.
It *is* needed.
An extra field with default can be used in the key lookup. A usual field should have an explicit value. If it's not, it means it's not sent (MINIMAL/NOBLOB) and we don't know what real value is
I don't understand how this answers my question. Let me rephrase. What will break if you remove the condition f->field_index >= master_columns && ?
+ && !bitmap_is_set(&m_table->has_value_set, f->field_index) + && (non_deterministic_default || next_number_field == f->field_index)) + break; + } + return p; +}
/** Find the best key to use when locating the row in @c find_row(). @@ -8024,13 +8077,26 @@ static bool record_compare(TABLE *table) */ int Rows_log_event::find_key() { - uint i, best_key_nr, last_part; + uint i, best_key_nr; KEY *key, *UNINIT_VAR(best_key); ulong UNINIT_VAR(best_rec_per_key), tmp; DBUG_ENTER("Rows_log_event::find_key"); DBUG_ASSERT(m_table);
+ #ifdef DBUG_TRACE
#ifndef DBUG_OFF
Pity we have a double negation in similar places, I tried to avoid it and came across that macro
Yes, `#ifndef DBUG_OFF` is historical, and unfortunately this is how it should be checked. DBUG_TRACE is something Marko added recently and it's when you want to enable DBUG_PRINT, as far as I remember, but disable the rest of DBUG_* macros. You have DBUG_EXECUTE_IF, so it shouldn't be enabled by DBUG_TRACE.
+ auto _ = make_scope_exit([this]() { + DBUG_EXECUTE_IF("rpl_report_chosen_key", + push_warning_printf(m_table->in_use, + Sql_condition::WARN_LEVEL_NOTE, + ER_UNKNOWN_ERROR, "Key chosen: %d", + m_key_nr == MAX_KEY ? + -1 : m_key_nr);); + }); + #endif + + m_key_nr= MAX_KEY; best_key_nr= MAX_KEY; + uint parts_suit= 0;
@@ -8041,33 +8107,38 @@ int Rows_log_event::find_key() { if (!m_table->s->keys_in_use.is_set(i)) continue; + parts_suit= key_parts_suit_event(key); + if (!parts_suit) + continue; /* We cannot use a unique key with NULL-able columns to uniquely identify a row (but we can still select it for range scan below if nothing better is available). */ - if ((key->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME) + if ((key->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME && + parts_suit == key->user_defined_key_parts)
this doesn't make use of ext key parts. Should be something like
if (parts_suit == key->ext_key_parts || (parts_suit >= key->user_defined_key_parts && (key->flags & ....)))
why not just use ext_key_parts (which I'm using now btw)?
Now, you mean, in the next patch? I'll check it out. The logic of my if() above is "either it uses the extended key (which is always truly unique) - or it uses only user specified key parts and has HA_NOSAME but no HA_NULL_PART_KEY". Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On Tue, 18 Oct 2022 at 01:01, Sergei Golubchik <serg@mariadb.org> wrote:
diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 422d496d5c3..25705d13641 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -6062,7 +6065,7 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) */ sql_mode_t saved_sql_mode= thd->variables.sql_mode; if (!is_auto_inc_in_extra_columns()) - thd->variables.sql_mode= (rpl_data.is_online_alter() ? saved_sql_mode :0) + thd->variables.sql_mode= (m_online_alter ? saved_sql_mode :0) | MODE_NO_AUTO_VALUE_ON_ZERO;
wouldn't it be more correct to use saved_sql_mode also in normal replication?
using MODE_NO_AUTO_VALUE_ON_ZERO was a fix to Bug #56662 Assertion failed: next_insert_id == 0, file .\handler.cc. See commit d5bf6b8aa8bb
No, I didn't mean to ask "why MODE_NO_AUTO_VALUE_ON_ZERO", I meant to ask why not
thd->variables.sql_mode= saved_sql_mode | MODE_NO_AUTO_VALUE_ON_ZERO;
that is why not "to use saved_sql_mode also in normal replication"
Oh, sorry for misinterpretation. I have read what I wanted to read:) Meanwhile I had a time to think. Imo it should be so in replication. It is a background process, right? So we have no control over sql_mode in the replica nodes (if I understand the design correctly). So we should make it as permissive as possible. I have no opinion on NO_ZERO_DATE and other "View" (in terms of Model-View-Controller) modes, like EMPTY_STRING_IS_NULL, if they don't affect storage, they are unimportant. For the rest, we should allow as much execution, and as much conversions as possible, which is, as I recall, is achieved by zeroing the mode, and MODE_NO_AUTO_VALUE_ON_ZERO is an exception that was found once. On the contrary, in ALTER TABLE, it is clear that we should preserve the user mode, as we have full control over it, and we want to preserve usual ALTER TABLE behavior. Though I suppose, there is a known issue that we should handle, so we have to override MODE_NO_AUTO_VALUE_ON_ZERO. This is very likely, as there's also such code in the beginning of copy_data_between_tables: if (def->field) { if (*ptr == to->next_number_field) { auto_increment_field_copied= TRUE; /* If we are going to copy contents of one auto_increment column to another auto_increment column it is sensible to preserve zeroes. This condition also covers case when we are don't actually alter auto_increment column. */ if (def->field == from->found_next_number_field) thd->variables.sql_mode|= MODE_NO_AUTO_VALUE_ON_ZERO; }
+ uint p= 0; + for (;p < key->ext_key_parts; p++) + { + Field *f= key->key_part[p].field; + uint non_deterministic_default= f->default_value && + f->default_value->flags | VCOL_NOT_STRICTLY_DETERMINISTIC; + + int next_number_field= f->table->next_number_field ? + f->table->next_number_field->field_index : -1;
eh? 1. why it's *in the loop* ? 2. why not simply
|| f == f->table->next_number_field
or, to be extra safe
|| f->table->field[f->field_index] == f->table->next_number_field
Thanks! Turns out there's no need in extra safety, it just works.
by "extra safety" I meant the case when
f != f->table->field[f->field_index]
(here f= key->key_part[p].field). This is the case for varchar/blob prefix keys. But as they cannot be auto-increment anyway, the direct comparison should work, I suppose.
Hmm, thanks for noticing, I really didn't consider this blobs case.
+ + if (f->field_index >= master_columns
I don't think this condition ^^^ is needed. Same as elsewhere, if f->field_index < master_columns but !bitmap_is_set, does it mean the keyseg is ok? I don't think so.
It *is* needed.
An extra field with default can be used in the key lookup. A usual field should have an explicit value. If it's not, it means it's not sent (MINIMAL/NOBLOB) and we don't know what real value is
I don't understand how this answers my question. Let me rephrase. What will break if you remove the condition
f->field_index >= master_columns &&
?
MINIMAL/NOBLOB replication will break. you can remove the line and see what breaks. I think this one can fail for example: create or replace table t (a int primary key, b int default (a+1)); insert into t values (10, 10),(20, 20),(30, 30); --sync_slave_with_master alter table t drop primary key, add c int default(a), add primary key(b); --connection master delete from t where a = 20; update t set a = a + 2 where a = 10; --sync_slave_with_master select * from t; --echo # Cleanup --connection master drop table t; Here, we add a primary key on the field b which is a master column, though it wasn't replicated. Even though it has a deterministic default it can't be used, or the values would be not 10, 30 but 13, 31
@@ -8041,33 +8107,38 @@ int Rows_log_event::find_key()
{ if (!m_table->s->keys_in_use.is_set(i)) continue; + parts_suit= key_parts_suit_event(key); + if (!parts_suit) + continue; /* We cannot use a unique key with NULL-able columns to uniquely identify a row (but we can still select it for range scan below if nothing better is available). */ - if ((key->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME) + if ((key->flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME && + parts_suit == key->user_defined_key_parts)
this doesn't make use of ext key parts. Should be something like
if (parts_suit == key->ext_key_parts || (parts_suit >= key->user_defined_key_parts && (key->flags & ....)))
why not just use ext_key_parts (which I'm using now btw)?
Now, you mean, in the next patch? I'll check it out.
The logic of my if() above is "either it uses the extended key (which is always truly unique) - or it uses only user specified key parts and has HA_NOSAME but no HA_NULL_PART_KEY".
Why can't we use a part of extended key part in the same way? I don't know a good example, but assume it could be long uniques. As long as index algorithm in BTREE, i'd expect it to work. And for other indexes, like HASH, i think, there was no support anyway -- Yours truly, Nikita Malyavin
Hi, Nikita, On Oct 18, Nikita Malyavin wrote:
On Tue, 18 Oct 2022 at 01:01, Sergei Golubchik <serg@mariadb.org> wrote:
diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 422d496d5c3..25705d13641 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -6062,7 +6065,7 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi) */ sql_mode_t saved_sql_mode= thd->variables.sql_mode; if (!is_auto_inc_in_extra_columns()) - thd->variables.sql_mode= (rpl_data.is_online_alter() ? saved_sql_mode :0) + thd->variables.sql_mode= (m_online_alter ? saved_sql_mode :0) | MODE_NO_AUTO_VALUE_ON_ZERO;
No, I didn't mean to ask "why MODE_NO_AUTO_VALUE_ON_ZERO", I meant to ask why not
thd->variables.sql_mode= saved_sql_mode | MODE_NO_AUTO_VALUE_ON_ZERO;
that is why not "to use saved_sql_mode also in normal replication"
Oh, sorry for misinterpretation. I have read what I wanted to read:)
Meanwhile I had a time to think. Imo it should be so in replication. It is a background process, right? So we have no control over sql_mode in the replica nodes (if I understand the design correctly).
So we should make it as permissive as possible. I have no opinion on NO_ZERO_DATE and other "View" (in terms of Model-View-Controller) modes, like EMPTY_STRING_IS_NULL, if they don't affect storage, they are unimportant. For the rest, we should allow as much execution, and as much conversions as possible, which is, as I recall, is achieved by zeroing the mode, and MODE_NO_AUTO_VALUE_ON_ZERO is an exception that was found once.
Yes, agree. You're right.
eh? 1. why it's *in the loop* ? 2. why not simply
|| f == f->table->next_number_field
or, to be extra safe
|| f->table->field[f->field_index] == f->table->next_number_field
Thanks! Turns out there's no need in extra safety, it just works.
by "extra safety" I meant the case when
f != f->table->field[f->field_index]
(here f= key->key_part[p].field). This is the case for varchar/blob prefix keys. But as they cannot be auto-increment anyway, the direct comparison should work, I suppose.
Hmm, thanks for noticing, I really didn't consider this blobs case.
still, you didn't use `f == f->table->next_number_field` but we concluded it's safe, didn't we? === Also, I've looked at your latest branch. What were you optimizing with the commit 3afa3288221 (the one with usable_keys_data)? It's complex and I don't quite see what's the purpose of it. It looks like all you need to do is to decide on the best index to use, once, save it somewhere, e.g. in RPL_TABLE_LIST, and that's it. This can be done, for example, in copy_data_between_tables(). Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Hi! On Sun, 30 Oct 2022 at 18:20, Sergei Golubchik <serg@mariadb.org> wrote:
still, you didn't use `f == f->table->next_number_field` but we concluded it's safe, didn't we?
Yes, I just think I only fixed it in the last commit "improve DEFAULT rules" (13222947)
Also, I've looked at your latest branch. What were you optimizing with the commit 3afa3288221 (the one with usable_keys_data)?
It's complex and I don't quite see what's the purpose of it. It looks like all you need to do is to decide on the best index to use, once, save it somewhere, e.g. in RPL_TABLE_LIST, and that's it.
Yes, and I do it that way, just few details: * There was no way to access RPL_TABLE_LIST in find_key directly, Rpl_table_data was used for that. Maybe we should return RPL_TABLE_LIST directly everywhere, as I had found no example where RPL_TABLE_LIST is unavailable. 0e04e82463 is made for that purpose. * I didn't want to allocate memory as soon as replication is started. What if there are only write_row events? So it is done lazily. * Sometimes the key set can change, if row_format was changed. So it needs to be recalculated
This can be done, for example, in copy_data_between_tables().
For ONLINE ALTER TABLE yes, but what about usual replication? I'd prefer one generic algorithm for everything. -- Yours truly, Nikita Malyavin
Hi, Nikita, On Oct 31, Nikita Malyavin wrote:
Also, I've looked at your latest branch. What were you optimizing with the commit 3afa3288221 (the one with usable_keys_data)?
It's complex and I don't quite see what's the purpose of it. It looks like all you need to do is to decide on the best index to use, once, save it somewhere, e.g. in RPL_TABLE_LIST, and that's it.
This can be done, for example, in copy_data_between_tables().
For ONLINE ALTER TABLE yes, but what about usual replication? I'd prefer one generic algorithm for everything.
For replication it already works. The key is found once, before a batch of rows. It cannot be done less often than that, because the next row event can be for a different table and have a different row format. For online alter all events are for the same table and all events have full rows. There is no need to recalculate the key per event. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On Mon, 31 Oct 2022 at 14:08, Sergei Golubchik <serg@mariadb.org> wrote:
Hi, Nikita,
On Oct 31, Nikita Malyavin wrote:
Also, I've looked at your latest branch. What were you optimizing with the commit 3afa3288221 (the one with usable_keys_data)?
It's complex and I don't quite see what's the purpose of it. It looks like all you need to do is to decide on the best index to use, once, save it somewhere, e.g. in RPL_TABLE_LIST, and that's it.
This can be done, for example, in copy_data_between_tables().
For ONLINE ALTER TABLE yes, but what about usual replication? I'd prefer one generic algorithm for everything.
For replication it already works.
It didn't work for cases with extra columns, when master a primary key, or some other key was amended with an extra column, for example. Or when the set of keys has been changed some other way. The key is found once, before a batch
of rows. It cannot be done less often than that, because the next row event can be for a different table and have a different row format.
I hope RPL_TABLE_LIST can be preserved between the events. Is it so?
-- Yours truly, Nikita Malyavin
Hi, Nikita, On Oct 31, Nikita Malyavin wrote:
On Mon, 31 Oct 2022 at 14:08, Sergei Golubchik <serg@mariadb.org> wrote:
On Oct 31, Nikita Malyavin wrote:
Also, I've looked at your latest branch. What were you optimizing with the commit 3afa3288221 (the one with usable_keys_data)?
It's complex and I don't quite see what's the purpose of it. It looks like all you need to do is to decide on the best index to use, once, save it somewhere, e.g. in RPL_TABLE_LIST, and that's it.
This can be done, for example, in copy_data_between_tables().
For ONLINE ALTER TABLE yes, but what about usual replication? I'd prefer one generic algorithm for everything.
For replication it already works.
It didn't work for cases with extra columns, when master a primary key, or some other key was amended with an extra column, for example. Or when the set of keys has been changed some other way.
Yes, replication didn't support those cases. It didn't look like users care, though.
The key is found once, before a batch of rows. It cannot be done less often than that, because the next row event can be for a different table and have a different row format.
I hope RPL_TABLE_LIST can be preserved between the events. Is it so?
Yes, it appears to be created in Table_map_log_event and removed on commit. That is, it survives multiple row events. Still, there are only two possibilities: * the table has a PK - then row format doesn't matter, just use a PK * the table has no PK - then row format doesn't matter, row pre-image will be always FULL So, you're right that in the replication case it's also enough to calculate the best key once, store it in RPL_TABLE_LIST, and not recalculate it per event. But even then it doesn't need a complex cache, but just one uint key_to_use; in RPL_TABLE_LIST. You only need some complex logic if the table has PK on the master, row image is not FULL, and you cannot use this PK on the slave. This use case wasn't supported before, it's not needed for online alter, so I suggest we won't try to do it together with already big and complex online alter task. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
On Mon, 31 Oct 2022 at 21:22, Sergei Golubchik <serg@mariadb.org> wrote:
I hope RPL_TABLE_LIST can be preserved between the events. Is it so?
Yes, it appears to be created in Table_map_log_event and removed on commit. That is, it survives multiple row events.
Still, there are only two possibilities:
* the table has a PK - then row format doesn't matter, just use a PK * the table has no PK - then row format doesn't matter, row pre-image will be always FULL
Sooo we can always just use the first key we have, assuming PK always comes first..
So, you're right that in the replication case it's also enough to calculate the best key once, store it in RPL_TABLE_LIST, and not recalculate it per event.
But even then it doesn't need a complex cache, but just one
uint key_to_use;
in RPL_TABLE_LIST.
Agree, one key is enough to store. I also wanted to ask: -- What if ROW_FORMAT was changed in between of a transaction? But I think we don't care for now, as we're gonna assume taht salve keys are not being changed and always match master's schema.
You only need some complex logic if the table has PK on the master, row image is not FULL, and you cannot use this PK on the slave. This use case wasn't supported before, it's not needed for online alter, so I suggest we won't try to do it together with already big and complex online alter task.
By complex logic, do you mean all that bitmap comparison I made? Then yes, agree. Let me simplify that -- Yours truly, Nikita Malyavin
Hi, Nikita, On Nov 01, Nikita Malyavin wrote:
On Mon, 31 Oct 2022 at 21:22, Sergei Golubchik <serg@mariadb.org> wrote:
I hope RPL_TABLE_LIST can be preserved between the events. Is it so?
Yes, it appears to be created in Table_map_log_event and removed on commit. That is, it survives multiple row events.
Still, there are only two possibilities:
* the table has a PK - then row format doesn't matter, just use a PK * the table has no PK - then row format doesn't matter, row pre-image will be always FULL
Sooo we can always just use the first key we have, assuming PK always comes first..
Not quite. If there is no PK, we should use the key with the highest cardinality.
So, you're right that in the replication case it's also enough to calculate the best key once, store it in RPL_TABLE_LIST, and not recalculate it per event.
But even then it doesn't need a complex cache, but just one
uint key_to_use;
in RPL_TABLE_LIST.
Agree, one key is enough to store.
I also wanted to ask: -- What if ROW_FORMAT was changed in between of a transaction?
Doesn't matter, see above. If the table had PK, then simpy use PK, even if ROW_FORMAT changes. If the table had no PK, then binlog will always have the full row, no matter how you'll be changing ROW_FORMAT. Regards, Sergei VP of MariaDB Server Engineering and security@mariadb.org
Sergei, I made it as we agreed, please see the new commit: 9d04b702 -- Yours truly, Nikita Malyavin
participants (2)
-
Nikita Malyavin
-
Sergei Golubchik