developers
Threads by month
- ----- 2024 -----
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
December 2020
- 6 participants
- 20 discussions
Re: [Maria-developers] 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
by Sergei Golubchik 21 Jul '21
by Sergei Golubchik 21 Jul '21
21 Jul '21
Hi, Aleksey!
Don't be confused, the commit header and the comment come from your last
commit, but the diff below includes all three commits that mention MDEV-18734.
On Nov 02, Aleksey Midenkov wrote:
> revision-id: 2f9bbf392072 (mariadb-10.2.31-543-g2f9bbf392072)
> parent(s): 1f4960e3f2ac
> author: Aleksey Midenkov <midenok(a)gmail.com>
> committer: Aleksey Midenkov <midenok(a)gmail.com>
> timestamp: 2020-11-02 14:26:04 +0300
> message:
>
> MDEV-18734 optimization by taking ownership via String::swap()
>
> ha_partition stores records in array of m_ordered_rec_buffer and uses
> it for prio queue in ordered index scan. When the records are restored
> from the array the blob buffers may be already freed or
> rewritten. This can happen when blob buffers are cached in
> Field_blob::value or Field_blob::read_value.
>
> Previous solution worked via copying blob buffer into
> mem_root-allocated buffer. That was not optimal as it requires more
> memory and processing time.
>
> This patch avoids memory copy by taking temporary ownership of cached
> blob buffers via String::swap(). It also fixes problem when
> value.str_length is wrong after Field_blob::store().
> diff --git a/mysql-test/suite/federated/federated_partition.test b/mysql-test/suite/federated/federated_partition.test
> index ef1e27ec5054..33ee025442f4 100644
> --- a/mysql-test/suite/federated/federated_partition.test
> +++ b/mysql-test/suite/federated/federated_partition.test
> @@ -50,4 +50,30 @@ drop table federated.t1_2;
>
> --echo End of 5.1 tests
>
> +--echo #
> +--echo # MDEV-18734 ASAN heap-use-after-free upon sorting by blob column from partitioned table
> +--echo #
> +connection slave;
> +use federated;
> +create table t1_1 (x int, b text, key(x));
> +create table t1_2 (x int, b text, key(x));
> +connection master;
> +--replace_result $SLAVE_MYPORT SLAVE_PORT
> +eval create table t1 (x int, b text, key(x)) engine=federated
> + partition by range columns (x) (
> + partition p1 values less than (40) connection='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1_1',
> + partition pn values less than (maxvalue) connection='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1_2'
> +);
> +insert t1 values (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8);
> +insert t1 select x + 8, x + 8 from t1;
> +insert t1 select x + 16, x + 16 from t1;
> +insert t1 select x + 49, repeat(x + 49, 100) from t1;
> +--echo # This produces wrong result before MDEV-17573
> +select * from t1;
What do you mean, simple `select * from t1` produces incorrect result?
> +flush tables;
> +select x, b from t1 where x > 30 and x < 60 order by b;
Is this ASAN-only test?
> +drop table t1;
> +connection slave;
> +drop table t1_1, t1_2;
> +
> source include/federated_cleanup.inc;
> diff --git a/mysql-test/suite/vcol/t/partition.test b/mysql-test/suite/vcol/t/partition.test
> index 889724fb1c5c..f10ee0491ccc 100644
> --- a/mysql-test/suite/vcol/t/partition.test
> +++ b/mysql-test/suite/vcol/t/partition.test
> @@ -30,3 +30,28 @@ subpartition by hash(v) subpartitions 3 (
> insert t1 set i= 0;
> set statement sql_mode= '' for update t1 set i= 1, v= 2;
> drop table t1;
> +
> +--echo #
> +--echo # MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table
> +--echo #
> +--echo # Cover queue_fix() in ha_partition::handle_ordered_index_scan()
> +create table t1 (
> + x int auto_increment primary key,
> + b text, v mediumtext as (b) virtual,
> + index (v(10))
> +) partition by range columns (x) (
> + partition p1 values less than (3),
> + partition p2 values less than (6),
> + partition pn values less than (maxvalue));
> +insert into t1 (b) values ('q'), ('a'), ('b');
> +update t1 set b= 'bar' where v > 'a';
> +drop table t1;
This test didn't fail for me in an ASAN build
on the vanilla 10.5 without your fix.
That is, I'm not sure it actually tests the fix.
> +
> +--echo # Cover return_top_record() in ha_partition::handle_ordered_index_scan()
> +create table t1 (x int primary key, b tinytext, v text as (b) virtual)
> +partition by range columns (x) (
> + partition p1 values less than (4),
> + partition pn values less than (maxvalue));
> +insert into t1 (x, b) values (1, ''), (2, ''), (3, 'a'), (4, 'b');
> +update t1 set b= 'bar' where x > 0 order by v limit 2;
This fails without a fix all right.
Still, I wonder whether it's possible to create a test that'll fail
in a normal optimized build w/o ASAN
> +drop table t1;
> diff --git a/sql/field.cc b/sql/field.cc
> index bdaaecc20269..0fd40c979d2c 100644
> --- a/sql/field.cc
> +++ b/sql/field.cc
> @@ -8310,6 +8310,7 @@ int Field_blob::store(const char *from,uint length,CHARSET_INFO *cs)
> copy_length= copier.well_formed_copy(field_charset,
> (char*) value.ptr(), new_length,
> cs, from, length);
> + value.length(copy_length);
good! Could this be a distinct bug with its own test case?
> Field_blob::store_length(copy_length);
> bmove(ptr+packlength,(uchar*) &tmp,sizeof(char*));
>
> diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
> index 9d82df0235c7..7c658082d397 100644
> --- a/sql/ha_partition.cc
> +++ b/sql/ha_partition.cc
> @@ -5125,14 +5128,14 @@ bool ha_partition::init_record_priority_queue()
> uint alloc_len;
> uint used_parts= bitmap_bits_set(&m_part_info->read_partitions);
> /* Allocate record buffer for each used partition. */
> - m_priority_queue_rec_len= m_rec_length + PARTITION_BYTES_IN_POS;
> + m_priority_queue_rec_len= m_rec_length + ORDERED_REC_OFFSET;
> if (!m_using_extended_keys)
> m_priority_queue_rec_len += m_file[0]->ref_length;
> alloc_len= used_parts * m_priority_queue_rec_len;
> /* Allocate a key for temporary use when setting up the scan. */
> alloc_len+= table_share->max_key_length;
>
> - if (!(m_ordered_rec_buffer= (uchar*)my_malloc(alloc_len, MYF(MY_WME))))
> + if (!(m_ordered_rec_buffer= (uchar*) alloc_root(&m_ordered_root, alloc_len)))
I don't see why you need a memroot here. memroot is needed when you
allocate later at difeerent points in time some initially unpredictable
amount of memory that has the same lifetime and needs to be
freed all at once.
But here it seems that you only allocate once. You could just make
m_priority_queue_rec_len = m_rec_length + PARTITION_BYTES_IN_POS +
table->s->blob_fields * sizeof(Ordered_blob_storage)
(plus necessary alignment).
> DBUG_RETURN(true);
>
> /*
> @@ -6178,7 +6203,11 @@ int ha_partition::handle_ordered_index_scan(uchar *buf, bool reverse_order)
> */
> error= file->read_range_first(m_start_key.key? &m_start_key: NULL,
> end_range, eq_range, TRUE);
> - memcpy(rec_buf_ptr, table->record[0], m_rec_length);
> + if (!error)
> + {
> + memcpy(rec_buf_ptr, table->record[0], m_rec_length);
> + }
did you have a bug because of that? something didn't work?
> +
> reverse_order= FALSE;
> break;
> }
> @@ -6310,6 +6349,43 @@ int ha_partition::handle_ordered_index_scan_key_not_found()
> }
>
>
> +void ha_partition::swap_blobs(uchar * rec_buf, Ordered_blob_storage ** storage, bool restore)
> +{
> + uint *ptr, *end;
> + uint blob_n= 0;
> + table->move_fields(table->field, rec_buf, table->record[0]);
> + for (ptr= table->s->blob_field, end= ptr + table->s->blob_fields;
> + ptr != end; ++ptr, ++blob_n)
> + {
> + DBUG_ASSERT(*ptr < table->s->fields);
> + Field_blob *blob= (Field_blob*) table->field[*ptr];
> + DBUG_ASSERT(blob->flags & BLOB_FLAG);
> + DBUG_ASSERT(blob->field_index == *ptr);
> + if (!bitmap_is_set(table->read_set, *ptr) || blob->is_null())
> + continue;
> +
> + Ordered_blob_storage &s= *storage[blob_n];
> +
> + if (restore)
> + {
> + if (!s.blob.is_empty())
> + blob->swap(s.blob, s.set_read_value);
looks somewhat risky, why blob.length == 0 means that you don't
need to restore the value?
> + }
> + else
> + {
> + bool set_read_value;
> + String *cached= blob->cached(set_read_value);
> + if (cached)
> + {
> + cached->swap(s.blob);
> + s.set_read_value= set_read_value;
is it indeed possible here for a value to be either in `value` or in
`read_value` ?
When happens what?
> + }
> + }
> + }
> + table->move_fields(table->field, table->record[0], rec_buf);
> +}
> +
> +
> /*
> Common routine to handle index_next with ordered results
>
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
11
Re: [Maria-developers] 27060eb6ba5: MDEV-21916: COM_STMT_BULK_EXECUTE with RETURNING insert wrong values
by Sergei Golubchik 10 Jun '21
by Sergei Golubchik 10 Jun '21
10 Jun '21
Hi, Oleksandr!
On Oct 12, Oleksandr Byelkin wrote:
> revision-id: 27060eb6ba5 (mariadb-10.5.4-220-g27060eb6ba5)
> parent(s): 861cd4ce286
> author: Oleksandr Byelkin <sanja(a)mariadb.com>
> committer: Oleksandr Byelkin <sanja(a)mariadb.com>
> timestamp: 2020-10-07 15:22:41 +0200
> message:
>
> MDEV-21916: COM_STMT_BULK_EXECUTE with RETURNING insert wrong values
>
> To allocate new net buffer to avoid changing bufer we are reading.
You still didn't clarify the commit comment
> diff --git a/sql/sql_delete.cc b/sql/sql_delete.cc
> index 7280236e43f..5aaff3cf623 100644
> --- a/sql/sql_delete.cc
> +++ b/sql/sql_delete.cc
> @@ -685,8 +685,14 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds,
> !table->prepare_triggers_for_delete_stmt_or_event())
> will_batch= !table->file->start_bulk_delete();
>
> - if (returning)
> + /*
> + thd->get_stmt_da()->is_set() means first iteration of prepared statement
> + with array binding operation execution (non optimized so it is not
> + INSERT)
> + */
> + if (returning && !thd->get_stmt_da()->is_set())
> {
> + DBUG_ASSERT(thd->lex->sql_command != SQLCOM_INSERT);
a strange assert to see in sql_delete.cc :)
can one even reach mysql_delete() with SQLCOM_INSERT?
not just with returning and !thd->get_stmt_da()->is_set(), anywhere?
> if (result->send_result_set_metadata(returning->item_list,
> Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF))
> goto cleanup;
> diff --git a/sql/sql_error.cc b/sql/sql_error.cc
> index b3ef0d89a98..a753af2b34d 100644
> --- a/sql/sql_error.cc
> +++ b/sql/sql_error.cc
> @@ -380,16 +380,33 @@ Diagnostics_area::set_eof_status(THD *thd)
> if (unlikely(is_error() || is_disabled()))
> return;
>
> + if (m_status == DA_EOF_BULK)
> + {
> /*
> If inside a stored procedure, do not return the total
> number of warnings, since they are not available to the client
> anyway.
> */
> + if (!thd->spcont)
> + m_statement_warn_count+= current_statement_warn_count();
> + }
> + else
> + {
> + /*
> + If inside a stored procedure, do not return the total
> + number of warnings, since they are not available to the client
> + anyway.
> + */
I don't think it helps to duplicate the comment. You could just put it
once before the if.
> if (thd->spcont)
> {
> m_statement_warn_count= 0;
> + m_affected_rows= 0;
why do you reset m_affected_rows too?
> }
> else
> m_statement_warn_count= current_statement_warn_count();
> + m_status= (is_bulk_op() ? DA_EOF_BULK : DA_EOF);
> + }
do we have tests for bulk operations and CALL?
>
> - m_status= DA_EOF;
> DBUG_VOID_RETURN;
> }
>
> diff --git a/sql/sql_prepare.cc b/sql/sql_prepare.cc
> index ecb56e70f88..c144d3a8d7e 100644
> --- a/sql/sql_prepare.cc
> +++ b/sql/sql_prepare.cc
> @@ -4357,24 +4361,37 @@ Prepared_statement::execute_bulk_loop(String *expanded_query,
>
> + /*
> + Here second buffer for not optimized commands,
> + optimized commands do it inside thier internal loop.
> + */
> + if (!(sql_command_flags[lex->sql_command] & CF_SP_BULK_OPTIMIZED) &&
why "SP" and not "PS" ?
> + this->lex->has_returning())
what about CALL? It won't have lex->has_returning(). What about SELECT?
Can you bind an array to a parameter in SELECT or CALL?
> + {
> + // Above check can be true for SELECT in future
> + DBUG_ASSERT(lex->sql_command != SQLCOM_SELECT);
how can SQLCOM_SELECT have lex->has_returning() ?
> + readbuff= thd->net.buff; // old buffer
> + if (net_allocate_new_packet(&thd->net, thd, MYF(MY_THREAD_SPECIFIC)))
> + {
> + readbuff= NULL; // failure, net_allocate_new_packet keeps old buffer
> + goto err;
> + }
> }
>
> #ifndef EMBEDDED_LIBRARY
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
2
Re: [Maria-developers] ef2519fee4e: MDEV-16546 System versioning setting to allow history modification
by Aleksey Midenkov 08 Jun '21
by Aleksey Midenkov 08 Jun '21
08 Jun '21
Hello, Sergei!
On Fri, May 3, 2019 at 8:43 PM Sergei Golubchik <serg(a)mariadb.com> wrote:
>
> Hi, Aleksey!
>
> On May 03, Aleksey Midenkov wrote:
> > revision-id: ef2519fee4e (versioning-1.0.5-17-gef2519fee4e)
> > parent(s): 56145be2951
> > author: Aleksey Midenkov <midenok(a)gmail.com>
> > committer: Aleksey Midenkov <midenok(a)gmail.com>
> > timestamp: 2018-06-28 13:42:09 +0300
> > message:
> >
> > MDEV-16546 System versioning setting to allow history modification
> >
> > 1. Add server variable system_versioning_modify_history which will
> > allow to set values for row_start, row_end in DML operations.
> >
> > 2. If secure_timestamp is YES or REPLICATION,
> > system_versioning_modify_history does not have effect. If
> > secure_timestamp is SUPER, system_versioning_modify_history requires
> > special privilege (same as for setting current timestamp).
>
> I thought more about this idea. We don't really want to have the history
> editable, do we?
Well, I'm thinking about rollback table data to specific point in
time. That could be a useful feature.
> But it's needed for replication, to keep the master and
> slave identical. That's what secure_timestamp is for.
>
> The idea was that this new variable, system_versioning_modify_history,
> will be just a convenience feature, it will not allow history editing
> any more than one can do without it.
>
> But now I suspect that even with secure_timestamp=NO one cannot truly
> edit history. One can only insert new rows with arbitrary timestamps.
> For example, to insert a row with row_start=1000 and row_end=2000, one
> needs to do (if secure_timestamp=NO):
>
> set timestamp=1000;
> insert row;
> set timestamp=2000;
> delete row;
>
> But I don't see how one can update or delete a history row with
> secure_timestamp=NO.
>
> Now, with a SUPER privilege and secure_timestamp=NO or SUPER, one can
> use the BINLOG command and truly edit the history arbitrarily, by faking
> row events.
I don't really get it why this is so important: since there is some
limitation by configuration and privilege, we are just fine.
Everything can be changed at filesystem level after all.
>
> The conclusion, I believe, is that system_versioning_modify_history
> should allow INSERTs when secure_timestamp=NO, and it should allow
> UPDATE/DELETE only for a SUPER user when secure_timestamp=NO or SUPER.
I don't see a reason to argue on that. The only thing that is not
clear, why we don't allow INSERTs when secure_timestamp=SUPER?
>
> The second thing I don't like at all, is when a table is created like
>
> CREATE TABLE t1 (a int) WITH SYSTEM VERSIONING
>
> with row_start/row_end implicit. You don't have it in the test, but
> anyway one should be able to load history into such a table, while the
> table does not have row_start and row_end columns. From the user point
> of view these columns don't exist, they're pseudo-columns, like ROWID.
> They just cannot be insertable-into, conceptually. But a user will want
> to restore the history, right? I don't have a solution for this yet :(
> Any ideas?
We don't have to follow the conception if it doesn't help us. Since we
have physical row_start/row_end, we don't have to pretend they don't
exist. Who will win from that?
>
> See below a couple of minor comments about the patch itself.
>
> ...
These are going to be fixed.
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and security(a)mariadb.org
--
All the best,
Aleksey Midenkov
@midenok
2
15
31 May '21
Hi Alexey and everyone,
I was looking at whether it's possible use JSON_TABLE to extract portions of JSON
document. Apparently it is possible in MySQL with JSON datatype:
Q1:
select *
from
json_table('[{"color": "blue", "price": { "high": 10, "low": 5}},
{"color": "red", "price": { "high": 20, "low": 8}}]',
'$[*]' columns(color varchar(100) path '$.color',
price json path '$.price'
)
) as T;
+-------+------------------------+
| color | price |
+-------+------------------------+
| blue | {"low": 5, "high": 10} |
| red | {"low": 8, "high": 20} |
+-------+------------------------+
Note that if one uses any datatype other than JSON, they get NULLs:
Q2:
select *
from
json_table('[{"color": "blue", "price": { "high": 10, "low": 5}},
{"color": "red", "price": { "high": 20, "low": 8}}]',
'$[*]' columns(color varchar(100) path '$.color',
price text path '$.price'
)
) as T;
+-------+-------+
| color | price |
+-------+-------+
| blue | NULL |
| red | NULL |
+-------+-------+
Oracle-the-database doesn't yet(*) have a JSON datatype. So I can only run Q2
and then I get NULLs in the price column.
MariaDB accepts JSON as datatype so query Q1 is accepted.
However the logic in MDEV-17399 code doesn't have support for dumping a portion
of JSON document, so one gets empty strings in the price column.
Should we support Q1 with JSON output in the price column? If yes, should we
do it within the scope of MDEV-17399 or create another task for this?
(*) I see this:
https://docs.oracle.com/en/database/oracle/oracle-database/20/newft/new-jso…
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
3
2
Re: [Maria-developers] 21eb8969ce9: Improved storage size for Item, Field and some other classes
by Sergei Golubchik 14 May '21
by Sergei Golubchik 14 May '21
14 May '21
Hi, Monty!
Looks ok, but again it doesn't seem you've squashed intermediate commits
as you said you did.
Bit fields and non-existent commits in columnstore - it's is clearly an
intermediate work-in-progress state, all fixed in your later commits.
On Sep 08, Michael Widenius wrote:
> revision-id: 21eb8969ce9 (mariadb-10.5.2-270-g21eb8969ce9)
> parent(s): c3ecf0d6243
> author: Michael Widenius <michael.widenius(a)gmail.com>
> committer: Michael Widenius <michael.widenius(a)gmail.com>
> timestamp: 2020-09-02 20:58:34 +0300
> message:
>
> Improved storage size for Item, Field and some other classes
>
> - Changed order of class fields to remove dead alignment space.
> - Changed bool fields in Item to bit fields.
> - Used packed enum's for some fields in common classes
> - Removed not used Item::rsize.
> - Changed some class variables from uint/int to smaller type int's.
> - Ensured that field_index is uint16 in all classes and functions. Fixed
> also that we proparly compare with NO_CACHED_FIELD_INDEX when checking
> if variable is not set.
> - Removed checking of highest bit of unireg_check (has not been used in
> a long time)
> - Fixed wrong arguments to make_cond_for_table() for join_tab_idx_arg
> from false to 0.
>
> One of the result was reducing the size if class Item with ~24 bytes
...
> @@ -929,18 +904,48 @@ class Item: public Value_source,
> +
> + /*
> + str_values's main purpose is to be used to cache the value in
> + save_in_field
> + */
> + String str_value;
> +
> + LEX_CSTRING name; /* Name of item */
> + /* Original item name (if it was renamed)*/
> + const char *orig_name;
> +
> + uint32 /* All common bool variables for Item stored here */
> + maybe_null:1, /* If item may be null */
> + in_rollup:1, /* If used in GROUP BY list of a query with ROLLUP */
> + with_param:1, /* True if Item contains an SP parameter */
> + with_window_func:1, /* True if item contains a window func */
> + with_field:1, /* True if any item except Item_sum contains a field.
> + Set during parsing. */
> + fixed:1; /* If item was fixed with fix_fields */
> +
> + int16 marker;
> +
> diff --git a/storage/columnstore/columnstore b/storage/columnstore/columnstore
> index b6b02ed516f..f606e76fb77 160000
> --- a/storage/columnstore/columnstore
> +++ b/storage/columnstore/columnstore
> @@ -1 +1 @@
> -Subproject commit b6b02ed516f92055127d416370799d91a82754ea
> +Subproject commit f606e76fb779e40f3376693fff9969e4f2b7669a
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
3
Re: [Maria-developers] 4088269c578d: Improved storage size for Item, Field and some other classes
by Sergei Golubchik 14 May '21
by Sergei Golubchik 14 May '21
14 May '21
Hi, Michael!
On Dec 03, Michael Widenius wrote:
> revision-id: 4088269c578d (mariadb-10.5.2-270-g4088269c578d)
> parent(s): 994ea2af3973
> author: Michael Widenius <michael.widenius(a)gmail.com>
> committer: Michael Widenius <michael.widenius(a)gmail.com>
> timestamp: 2020-09-17 12:25:44 +0300
> message:
>
> Improved storage size for Item, Field and some other classes
>
> diff --git a/storage/columnstore/columnstore b/storage/columnstore/columnstore
> index b6b02ed516f9..f606e76fb779 160000
> --- a/storage/columnstore/columnstore
> +++ b/storage/columnstore/columnstore
> @@ -1 +1 @@
> -Subproject commit b6b02ed516f92055127d416370799d91a82754ea
> +Subproject commit f606e76fb779e40f3376693fff9969e4f2b7669a
Again. You should not push invalid commit hashes into the main branch:
storage/columnstore/columnstore $ git fetch origin
storage/columnstore/columnstore $ git show f606e76fb779e40f3376693fff9969e4f2b7669a
fatal: bad object f606e76fb779e40f3376693fff9969e4f2b7669a
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
1
Re: [Maria-developers] 9bf4b92cbc5: MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
by Sergei Golubchik 31 Mar '21
by Sergei Golubchik 31 Mar '21
31 Mar '21
Hi, Aleksey!
On Jun 29, Aleksey Midenkov wrote:
> revision-id: 9bf4b92cbc5 (mariadb-10.5.2-168-g9bf4b92cbc5)
> parent(s): 478301d9b9a
> author: Aleksey Midenkov <midenok(a)gmail.com>
> committer: Aleksey Midenkov <midenok(a)gmail.com>
> timestamp: 2020-04-17 17:04:24 +0300
> message:
>
> MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
>
> == Syntax change ==
>
> Keyword AUTO_INCREMENT (or AUTO) enables partition auto-creation.
I still think that AUTO_INCREMENT here looks rather out-of-place,
I don't know how to explain to users why AUTO_INCREMENT is allowed here.
But if we'll have it, we'll have to support it here for a long time
for compatibility reasons. "Because somebody might be already using it"
I'd suggest to support only AUTO here.
Or, if the parser can do it easily, a much more readable syntax would be
create table t1 (x int) with system versioning
auto partition by system_time interval 1 hour;
this would be the best, easy to read, matches the documentation (where
it could be called "auto partitioning"). If it won't cause any difficult
parser problems, I'd prefer us to use that.
> create or replace table t1 (x int) with system versioning
> partition by system_time interval 1 hour auto_increment;
>
> create or replace table t1 (x int) with system versioning
> partition by system_time limit 1000 auto;
>
> Or with explicit partitions:
>
> create or replace table t1 (x int) with system versioning
> partition by system_time interval 1 hour auto
> (partition p0 history, partition pn current);
>
> == Description ==
>
> Before executing history-generating DML command add N history
> partitions, so that N would be sufficient for potentially generated
> history. N > 1 may be required when history is rotated by INTERVAL and
> timestamp was jumped to future further than interval value.
>
> It is assumed that one DML command will not generate history rows more
> than specified LIMIT. Otherwise history would overflow generated
> partition, a warning would be printed, and user action would be
> required to rebuild partitions.
>
> Auto-creation is implemented by synchronous
> fast_alter_partition_table() call from the thread of the executed DML
> command before the command itself (by the fallback and retry mechanism
> similar to Discovery feature, see Open_table_context).
>
> Creating history partitions was made by the principle of minimal
> disruption of the main business process. I.e. when procedure of
> creation fails it will not (if possible) fail the original DML
> command. Warning info will display what happened and the last history
> partition may overflow. In such case partition rebuild is required to
> correctly display history info. User application may detect warning
> code ER_VERS_HIST_PART_FAILED and stop execution if that is preferred.
>
> The name for newly added partitions are generated like default
> partition names with extension of MDEV-22155 (which avoids name
> clashes by extending assignment counter to next free-enough gap).
>
> These DML commands trigger auto-creation:
>
> * DELETE (including multi-delete, excluding DELETE HISTORY)
> * UPDATE (including multi-update)
> * REPLACE (including REPLACE .. SELECT)
INSERT ... ON DUPLICATE KEY UPDATE ?
> diff --git a/mysql-test/suite/versioning/common.inc b/mysql-test/suite/versioning/common.inc
> index 355b571e5a0..b35a5138015 100644
> --- a/mysql-test/suite/versioning/common.inc
> +++ b/mysql-test/suite/versioning/common.inc
> @@ -6,6 +6,7 @@ if (!$TEST_VERSIONING_SO)
> source include/have_innodb.inc;
>
> set @@session.time_zone='+00:00';
> +set @@global.time_zone='+00:00';
Why is that? I'd expect your auto-adding of partitions to happen in the
context of a session, using session time zone.
Is it for replication?
> select ifnull(max(transaction_id), 0) into @start_trx_id from mysql.transaction_registry;
> set @test_start=now(6);
>
> diff --git a/mysql-test/suite/versioning/r/delete_history.result b/mysql-test/suite/versioning/r/delete_history.result
> index cb865a835b3..90c9e4777bb 100644
> --- a/mysql-test/suite/versioning/r/delete_history.result
> +++ b/mysql-test/suite/versioning/r/delete_history.result
> @@ -63,8 +63,6 @@ insert into t values (1);
> update t set a= 2;
> update t set a= 3;
> delete history from t;
> -Warnings:
> -Warning 4114 Versioned table `test`.`t`: last HISTORY partition (`p1`) is out of LIMIT, need more HISTORY partitions
good riddance, it didn't make much sense anyway
> # The above warning is one command late (MDEV-20345) ^^^
> select * from t for system_time all;
> a
> diff --git a/sql/ha_partition.h b/sql/ha_partition.h
> index 85cb736b5bd..5fb893b4f0e 100644
> --- a/sql/ha_partition.h
> +++ b/sql/ha_partition.h
> @@ -1623,7 +1623,7 @@ class ha_partition :public handler
could you also, please, fix (in a separate commit)
ha_partition::part_records() to take a proper
partition_element* argument, not void* ?
> for (; part_id < part_id_end; ++part_id)
> {
> handler *file= m_file[part_id];
> - DBUG_ASSERT(bitmap_is_set(&(m_part_info->read_partitions), part_id));
> + bitmap_set_bit(&(m_part_info->read_partitions), part_id);
> file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK | HA_STATUS_OPEN);
> part_recs+= file->stats.records;
> }
> diff --git a/sql/partition_info.cc b/sql/partition_info.cc
> index e9dbf2b49c3..72312ee7ac4 100644
> --- a/sql/partition_info.cc
> +++ b/sql/partition_info.cc
> @@ -814,10 +814,16 @@ bool partition_info::has_unique_name(partition_element *element)
> vers_info->interval Limit by fixed time interval
> vers_info->hist_part (out) Working history partition
> */
> -void partition_info::vers_set_hist_part(THD *thd)
> +uint partition_info::vers_set_hist_part(THD *thd, bool auto_inc)
let's not all it auto_inc. it'll add noise to every grep for auto_inc
issues. What about auto_add ?
> {
> + DBUG_ASSERT(!thd->lex->last_table()->vers_conditions.delete_history);
> +
> + uint create_count= 0;
> + auto_inc= auto_inc && vers_info->auto_inc;
> +
> if (vers_info->limit)
> {
> + DBUG_ASSERT(!vers_info->interval.is_set());
> ha_partition *hp= (ha_partition*)(table->file);
> partition_element *next= NULL;
> List_iterator<partition_element> it(partitions);
> @@ -836,22 +842,26 @@ void partition_info::vers_set_hist_part(THD *thd)
> {
> if (next == vers_info->now_part)
> {
> - my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
> - table->s->db.str, table->s->table_name.str,
> - vers_info->hist_part->partition_name, "LIMIT");
> + if (auto_inc)
> + create_count= 1;
> + else
> + my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
> + table->s->db.str, table->s->table_name.str,
> + vers_info->hist_part->partition_name, "LIMIT");
> }
> else
> vers_info->hist_part= next;
> }
> - return;
> + // reserve at least one history partition
> + if (auto_inc && create_count == 0 &&
> + vers_info->hist_part->id + 1 == vers_info->now_part->id)
> + create_count= 1;
Questionable. What does it solve?
> }
> -
> - if (vers_info->interval.is_set())
> + else if (vers_info->interval.is_set() &&
> + vers_info->hist_part->range_value <= thd->query_start())
> {
> - if (vers_info->hist_part->range_value > thd->query_start())
> - return;
> -
> partition_element *next= NULL;
> + bool error= true;
> List_iterator<partition_element> it(partitions);
> while (next != vers_info->hist_part)
> next= it++;
> @@ -860,12 +870,200 @@ void partition_info::vers_set_hist_part(THD *thd)
> {
> vers_info->hist_part= next;
> if (next->range_value > thd->query_start())
> - return;
> + {
> + error= false;
> + break;
but here you don't "reserve at least one history partition"
Why INTERVAL is different?
> + }
> + }
> + if (error)
> + {
> + if (auto_inc)
> + {
> + DBUG_ASSERT(thd->query_start() >= vers_info->hist_part->range_value);
> + my_time_t diff= thd->query_start() - vers_info->hist_part->range_value;
> + if (diff > 0)
> + {
> + size_t delta= vers_info->interval.seconds();
> + create_count= diff / delta + 1;
> + if (diff % delta)
> + create_count++;
> + }
> + else
> + create_count= 1;
> + }
> + else
> + {
> + my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
> + table->s->db.str, table->s->table_name.str,
> + vers_info->hist_part->partition_name, "INTERVAL");
> + }
> }
> - my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
> - table->s->db.str, table->s->table_name.str,
> - vers_info->hist_part->partition_name, "INTERVAL");
> }
> +
> + return create_count;
> +}
> +
> +
> +/**
> + @brief Run fast_alter_partition_table() to add new history partitions
> + for tables requiring them.
> +*/
> +void vers_add_auto_parts(THD *thd, TABLE_LIST* tl, uint num_parts)
> +{
> + HA_CREATE_INFO create_info;
> + Alter_info alter_info;
> + String query;
> + partition_info *save_part_info= thd->work_part_info;
> + Query_tables_list save_query_tables;
> + Reprepare_observer *save_reprepare_observer= thd->m_reprepare_observer;
> + Diagnostics_area new_stmt_da(thd->query_id, false, true);
> + Diagnostics_area *save_stmt_da= thd->get_stmt_da();
> + bool save_no_write_to_binlog= thd->lex->no_write_to_binlog;
> + const CSET_STRING save_query= thd->query_string;
> + thd->m_reprepare_observer= NULL;
> + thd->lex->reset_n_backup_query_tables_list(&save_query_tables);
> + thd->in_sub_stmt|= SUB_STMT_AUTO_HIST;
> + thd->lex->no_write_to_binlog= !thd->is_current_stmt_binlog_format_row();
> + TABLE *table= tl->table;
> +
> + DBUG_ASSERT(!thd->is_error());
> + /* NB: we have to preserve m_affected_rows, m_row_count_func, m_last_insert_id, etc */
> + thd->set_stmt_da(&new_stmt_da);
> + new_stmt_da.set_overwrite_status(true);
Is it needed?
You've just started using a new Diagnostics_area, the status should be
DA_EMPTY here, nothing to overwrite.
> +
> + {
> + DBUG_ASSERT(table->s->get_table_ref_type() == TABLE_REF_BASE_TABLE);
> + DBUG_ASSERT(table->versioned());
> + DBUG_ASSERT(table->part_info);
> + DBUG_ASSERT(table->part_info->vers_info);
> + alter_info.reset();
> + alter_info.partition_flags= ALTER_PARTITION_ADD|ALTER_PARTITION_AUTO_HIST;
> + create_info.init();
> + create_info.alter_info= &alter_info;
> + Alter_table_ctx alter_ctx(thd, tl, 1, &table->s->db, &table->s->table_name);
> +
> + MDL_REQUEST_INIT(&tl->mdl_request, MDL_key::TABLE, tl->db.str,
> + tl->table_name.str, MDL_SHARED_NO_WRITE, MDL_TRANSACTION);
> + if (thd->mdl_context.acquire_lock(&tl->mdl_request,
> + thd->variables.lock_wait_timeout))
> + goto exit;
> + table->mdl_ticket= tl->mdl_request.ticket;
> +
> + create_info.db_type= table->s->db_type();
> + create_info.options|= HA_VERSIONED_TABLE;
> + DBUG_ASSERT(create_info.db_type);
> +
> + create_info.vers_info.set_start(table->s->vers_start_field()->field_name);
> + create_info.vers_info.set_end(table->s->vers_end_field()->field_name);
> +
> + partition_info *part_info= new partition_info();
> + if (unlikely(!part_info))
> + {
> + my_error(ER_OUT_OF_RESOURCES, MYF(0));
> + goto exit;
> + }
> + part_info->use_default_num_partitions= false;
> + part_info->use_default_num_subpartitions= false;
> + part_info->num_parts= num_parts;
> + part_info->num_subparts= table->part_info->num_subparts;
> + part_info->subpart_type= table->part_info->subpart_type;
> + if (unlikely(part_info->vers_init_info(thd)))
> + {
> + my_error(ER_OUT_OF_RESOURCES, MYF(0));
> + goto exit;
> + }
> +
> + // NB: set_ok_status() requires DA_EMPTY
> + thd->get_stmt_da()->reset_diagnostics_area();
Is it needed?
You've just started using a new Diagnostics_area, the status should be
DA_EMPTY here.
> +
> + thd->work_part_info= part_info;
> + if (part_info->set_up_defaults_for_partitioning(thd, table->file, NULL,
> + table->part_info->next_part_no(num_parts)))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN,
> + ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "setting up defaults failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + bool partition_changed= false;
> + bool fast_alter_partition= false;
> + if (prep_alter_part_table(thd, table, &alter_info, &create_info,
> + &partition_changed, &fast_alter_partition))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter partitition prepare failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + if (!fast_alter_partition)
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "fast alter partitition is not possible");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + DBUG_ASSERT(partition_changed);
> + if (mysql_prepare_alter_table(thd, table, &create_info, &alter_info,
> + &alter_ctx))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter prepare failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> +
> + // Forge query string for rpl logging
> + if (!thd->lex->no_write_to_binlog)
> + {
> + query.set(STRING_WITH_LEN("ALTER TABLE `"), &my_charset_latin1);
better use StringBuffer<...> query;
because you're doing lots of reallocs now.
> +
> + if (query.append(table->s->db) ||
> + query.append(STRING_WITH_LEN("`.`")) ||
> + query.append(table->s->table_name) ||
this is wrong, the table and db names can have backticks inside.
use append_identifier() instead.
> + query.append("` ADD PARTITION PARTITIONS ") ||
> + query.append_ulonglong(part_info->num_parts) ||
> + query.append(" AUTO"))
> + {
> + my_error(ER_OUT_OF_RESOURCES, MYF(ME_ERROR_LOG));
> + goto exit;
> + }
> + CSET_STRING qs(query.c_ptr(), query.length(), &my_charset_latin1);
> + thd->set_query(qs);
> + }
Should it be binlogged at all? May be just leave it to the slave to
auto-add the partition as needed?
Looks a bit suspicious now, you log an ALTER TABLE possibly in the
middle of a transaction. It will be replayed differently, not as
auto-adding, in particular, it will commit. So, possibly different
results on the slave, perhaps different gtid.
Do clear it out with Andrei please. Or don't binlog auto-adding at all.
> +
> + if (fast_alter_partition_table(thd, table, &alter_info, &create_info,
> + tl, &table->s->db, &table->s->table_name))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter partition table failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + }
> + }
> +
> + if (!thd->transaction.stmt.is_empty())
> + trans_commit_stmt(thd);
1. why would this operation register a transaction?
2. you amended a couple of checks like
- if (thd->in_sub_stmt)
+ if (thd->in_sub_stmt & ~SUB_STMT_AUTO_HIST)
but what will happen if it's, for example, an insert inside a trigger?
So a real sub_stmt and SUB_STMT_AUTO_HIST?
> +
> +exit:
> + thd->work_part_info= save_part_info;
> + thd->m_reprepare_observer= save_reprepare_observer;
> + thd->lex->restore_backup_query_tables_list(&save_query_tables);
> + thd->in_sub_stmt&= ~SUB_STMT_AUTO_HIST;
> + if (!new_stmt_da.is_warning_info_empty())
> + save_stmt_da->copy_sql_conditions_from_wi(thd, new_stmt_da.get_warning_info());
> + thd->set_stmt_da(save_stmt_da);
> + thd->lex->no_write_to_binlog= save_no_write_to_binlog;
> + thd->set_query(save_query);
> }
>
>
> diff --git a/sql/partition_info.h b/sql/partition_info.h
> index 7ae2d168068..ca68e61cbe2 100644
> --- a/sql/partition_info.h
> +++ b/sql/partition_info.h
> @@ -72,9 +73,34 @@ struct Vers_part_info : public Sql_alloc
> my_time_t start;
> INTERVAL step;
> enum interval_type type;
> - bool is_set() { return type < INTERVAL_LAST; }
> + bool is_set() const { return type < INTERVAL_LAST; }
> + size_t seconds() const
> + {
> + if (step.second)
> + return step.second;
> + if (step.minute)
> + return step.minute * 60;
> + if (step.hour)
> + return step.hour * 3600;
> + if (step.day)
> + return step.day * 3600 * 24;
> + // comparison is used in rough estimates, it doesn't need to be calendar-correct
Are you sure the approximate value is enough here? You use it to
estimate how many partitions to create. It's not a big deal if you
create more than necessary (although it's not nice and we'll definitely
get bug reports if that will happen). But you surely don't want to
create less.
> + if (step.month)
> + return step.month * 3600 * 24 * 30;
> + DBUG_ASSERT(step.year);
> + return step.year * 86400 * 30 * 365;
> + }
> + bool lt(size_t secs) const
> + {
> + return seconds() < secs;
> + }
> + bool ge(size_t seconds) const
> + {
> + return !(this->lt(seconds));
> + }
lt() and ge() don't seem to be used anywhere.
> } interval;
> ulonglong limit;
> + bool auto_inc;
> partition_element *now_part;
> partition_element *hist_part;
> };
> diff --git a/sql/sql_base.cc b/sql/sql_base.cc
> index b89be77f282..5583d70f8eb 100644
> --- a/sql/sql_base.cc
> +++ b/sql/sql_base.cc
> @@ -1862,6 +1862,25 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)
> DBUG_PRINT("info",("Using locked table"));
> #ifdef WITH_PARTITION_STORAGE_ENGINE
> part_names_error= set_partitions_as_used(table_list, table);
> + if (table->part_info &&
> + table->part_info->part_type == VERSIONING_PARTITION &&
> + !table_list->vers_conditions.delete_history &&
> + table_list->lock_type >= TL_WRITE_ALLOW_WRITE &&
> + table_list->mdl_request.type == MDL_SHARED_WRITE)
> + {
> + switch (thd->lex->sql_command)
> + {
> + case SQLCOM_DELETE:
> + case SQLCOM_UPDATE:
> + case SQLCOM_REPLACE:
> + case SQLCOM_REPLACE_SELECT:
> + case SQLCOM_DELETE_MULTI:
> + case SQLCOM_UPDATE_MULTI:
> + /* Rotation is still needed under LOCK TABLES */
> + table->part_info->vers_set_hist_part(thd, false);
ALTER TABLE works under LOCK TABLES, so this should be possible too,
but let's have it as a separate task.
> + default:;
> + }
> + }
> #endif
> goto reset;
> }
> @@ -2104,6 +2123,37 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)
> tc_add_table(thd, table);
> }
>
> +#ifdef WITH_PARTITION_STORAGE_ENGINE
> + if (table->part_info &&
> + table->part_info->part_type == VERSIONING_PARTITION &&
> + !table_list->vers_conditions.delete_history &&
> + table_list->lock_type >= TL_WRITE_ALLOW_WRITE &&
> + table_list->mdl_request.type >= MDL_SHARED_WRITE &&
> + table_list->mdl_request.type < MDL_EXCLUSIVE)
> + {
> + switch (thd->lex->sql_command)
> + {
> + case SQLCOM_LOCK_TABLES:
> + case SQLCOM_DELETE:
> + case SQLCOM_UPDATE:
> + case SQLCOM_REPLACE:
> + case SQLCOM_REPLACE_SELECT:
> + case SQLCOM_DELETE_MULTI:
> + case SQLCOM_UPDATE_MULTI:
this is quite complex condition, better not to duplicate it. May be
something like
bool need_set_hist_part(TABLE_LIST *table_list, enum_sql_command sql_command)
{
if (table_list->table->part_info &&
table_list->table->part_info->part_type == VERSIONING_PARTITION &&
!table_list->vers_conditions.delete_history &&
table_list->lock_type >= TL_WRITE_ALLOW_WRITE &&
table_list->mdl_request.type >= MDL_SHARED_WRITE &&
table_list->mdl_request.type < MDL_EXCLUSIVE)
{
switch(sql_command) {
case SQLCOM_LOCK_TABLES:
case SQLCOM_DELETE:
case SQLCOM_UPDATE:
case SQLCOM_REPLACE:
case SQLCOM_REPLACE_SELECT:
case SQLCOM_DELETE_MULTI:
case SQLCOM_UPDATE_MULTI:
return true;
}
default:;
}
return false;
}
> + ot_ctx->vers_create_count= table->part_info->vers_set_hist_part(thd, true);
> + if (ot_ctx->vers_create_count)
> + {
> + ot_ctx->request_backoff_action(Open_table_context::OT_ADD_HISTORY_PARTITION,
> + table_list);
> + MYSQL_UNBIND_TABLE(table->file);
> + tc_release_table(table);
There's no MYSQL_UNBIND_TABLE/tc_release_table after other
request_backoff_action invocations. Why this one is special?
> + DBUG_RETURN(TRUE);
> + }
> + default:;
> + }
> + }
> +#endif
> +
> if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) &&
> table->s->table_category < TABLE_CATEGORY_INFORMATION)
> {
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index 14516811262..145ac5c5f64 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -3346,6 +3347,10 @@ class THD: public THD_count, /* this must be first */
>
> #ifdef WITH_PARTITION_STORAGE_ENGINE
> partition_info *work_part_info;
> + /**
> + List of tables requiring new history partition.
> + */
> + List<TABLE_SHARE> vers_auto_part_tables;
this doesn't seem to be used anywhere
> #endif
>
> #ifndef EMBEDDED_LIBRARY
> diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
> index 7cc1faea79b..3f6c1793432 100644
> --- a/sql/sql_yacc.yy
> +++ b/sql/sql_yacc.yy
> @@ -7521,14 +7529,17 @@ add_partition_rule:
>
> add_part_extra:
> /* empty */
> - | '(' part_def_list ')'
> + | '(' part_def_list ')' opt_vers_auto_inc
> {
> - LEX *lex= Lex;
> - lex->part_info->num_parts= lex->part_info->partitions.elements;
> + Lex->part_info->num_parts= Lex->part_info->partitions.elements;
> + if ($4)
> + Lex->alter_info.partition_flags|= ALTER_PARTITION_AUTO_HIST;
> }
> - | PARTITIONS_SYM real_ulong_num
> + | PARTITIONS_SYM real_ulong_num opt_vers_auto_inc
> {
> Lex->part_info->num_parts= $2;
> + if ($3)
> + Lex->alter_info.partition_flags|= ALTER_PARTITION_AUTO_HIST;
I'm confused. I thought that ALTER_PARTITION_AUTO_HIST is what you set
in vers_add_auto_parts() to mark auto-adding of a new partition.
But here you set it in the parser, when a user specifies AUTO in
partition specifications. So it seems you're using this flag for two
very different purposes. How do you distinguish between them?
And why did you decide to do it this way?
> }
> ;
>
> diff --git a/sql/sql_partition.cc b/sql/sql_partition.cc
> index fbed614489d..25017ee5425 100644
> --- a/sql/sql_partition.cc
> +++ b/sql/sql_partition.cc
> @@ -4825,6 +4829,13 @@ uint prep_alter_part_table(THD *thd, TABLE *table, Alter_info *alter_info,
> DBUG_RETURN(TRUE);
> }
>
> + if (alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST &&
> + (!table->part_info || !table->part_info->vers_info))
> + {
> + my_error(ER_SYNTAX_ERROR, MYF(0));
can it even happen? Or should it be DBUG_ASSERT(0) here?
> + DBUG_RETURN(TRUE);
> + }
> +
> partition_info *alt_part_info= thd->lex->part_info;
> /*
> This variable is TRUE in very special case when we add only DEFAULT
> @@ -5312,7 +5323,9 @@ that are reorganised.
> now_part= el;
> }
> }
> - if (*fast_alter_table && tab_part_info->vers_info->interval.is_set())
> + if (*fast_alter_table &&
> + !(alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST) &&
> + tab_part_info->vers_info->interval.is_set())
this !ALTER_PARTITION_AUTO_HIST - do you mean not vers_add_auto_parts() or
not sql statement that uses AUTO ?
> {
> partition_element *hist_part= tab_part_info->vers_info->hist_part;
> if (hist_part->range_value <= thd->query_start())
> @@ -5347,7 +5360,8 @@ that are reorganised.
> */
> if (!(alter_info->partition_flags & ALTER_PARTITION_TABLE_REORG))
> {
> - if (!alt_part_info->use_default_partitions)
> + if (!alt_part_info->use_default_partitions &&
> + !(alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST))
Sorry, I don't understand that if() at all. What does it do?
> {
> DBUG_PRINT("info", ("part_info: %p", tab_part_info));
> tab_part_info->use_default_partitions= FALSE;
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
4
Hi Alexey,
Input on the latest patch for MDEV-17399.
It's good to see the patch taking shape, but have you tried running MySQL's
tests for JSON_TABLE on it?
I did, and it has caught several crashes, as well as error-no-error
discrepancies and different query results. Some of these were expected, but
some show that something needs to be fixed.
Please find the first few cases below.
It seems there is something odd going odd with the name resolution, both in
MariaDB and MySQL - I don't have explanation for some of the errors. I intend
to study the issue more and elaborate in a later email.
Meanwhile, first obvious cases:
=== Crash in Name resolution ==
CREATE TABLE t1 (a INT, b INT);
CREATE VIEW v2 AS SELECT * FROM t1 LIMIT 2;
SELECT b
FROM (SELECT * FROM v2) vq1,
JSON_TABLE(CONCAT(vq1.b,'[{\"a\":\"3\"}]'),
'$[*]' COLUMNS (id FOR ORDINALITY,
jpath VARCHAR(100) PATH '$.a',
JEXST INT EXISTS PATH '$.b')
) AS dt;
=== Crash in error reporting ===
select * from
json_table(
'[{"a":"3"},{"a":2},{"b":1},{"a":0}]',
"!@#$!@#$" columns (id for ordinality,
jpath varchar(100) path '$.a',
jexst int exists path '$.b')
) as tt;
=== Item print misses quotes ===
CREATE VIEW v2 AS
SELECT * FROM JSON_TABLE('{}', '$' COLUMNS (
x VARCHAR(10) PATH '$.a' DEFAULT '"isn''t here"' ON EMPTY)
) t;
SHOW CREATE VIEW v2;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 't here"' ON EMPTY)) `t`' at line 1
=== Error for outer reference ==
# JSON_TABLE referring outer scope
CREATE TABLE t1(id int, jd JSON);
INSERT INTO t1 values (1, '[1,3,5]'),(2,'[2,4,6]');
SELECT * FROM t1 WHERE id IN
(SELECT * FROM JSON_TABLE(t1.jd, '$[*]' COLUMNS
(id INT PATH '$')) AS jt);
Produces:
ERROR 1054 (42S22): Unknown column 't1.jd' in 'JSON_TABLE argument'
Should not produce it
(Note that for some other kinds of outer references it seemed to work)
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
2
15
[Maria-developers] 3f48bc08b92: MDEV-23033: All slaves crash once in ~24 hours and loop restart with signal 11
by sujatha 31 Dec '20
by sujatha 31 Dec '20
31 Dec '20
revision-id: 3f48bc08b92b2734c2af7408c246b96338675205 (mariadb-10.2.31-646-g3f48bc08b92)
parent(s): 78292047a4747ccd9210ba36a185a1dbe825de89
author: Sujatha
committer: Sujatha
timestamp: 2020-12-31 15:43:53 +0530
message:
MDEV-23033: All slaves crash once in ~24 hours and loop restart with signal 11
Problem:
=======
Upon deleting or updating a row in a parent table (with primary key), if
the child table has virtual column and an associated key with ON UPDATE
CASCADE/ON DELETE CASCADE, it will result in slave crash.
Analysis:
========
Tables which are related through foreign key require prelocking similar to
triggers. i.e If a table has triggers/foreign keys we should add all tables
and routines used by them to the prelocking set. This prelocking happens
during 'open_and_lock_tables' call. Each table being opened is checked for
foreign key references. If foreign key reference exists then the child
table is opened and it is linked to the table_list. Upon any modification
to parent table its corresponding child tables are retried from table_list
and they are updated accordingly. This prelocking work fine on master.
On slave prelocking works for following cases.
- Statement/mixed based replication
- In row based replication when trigger execution is enabled through
'slave_run_triggers_for_rbr=YES/LOGGING/ENFORCE'
Otherwise it results in an assert/crash, as the parent table will not find
the corresponding child table and it will be NULL. Dereferencing NULL
pointer leads to slave server exit.
Fix:
===
Introduce a new 'slave_fk_event_map' flag similar to 'trg_event_map'. This
flag will ensure that when foreign key is enabled in row based replication
all the parent and child tables are prelocked, so that parent is able to
locate the child table.
---
mysql-test/suite/rpl/r/rpl_row_vcol_crash.result | 381 +++++++++++++++++++++
mysql-test/suite/rpl/t/rpl_row_vcol_crash.test | 410 +++++++++++++++++++++++
sql/log_event.cc | 44 ++-
sql/sql_base.cc | 6 +-
sql/table.h | 6 +-
5 files changed, 821 insertions(+), 26 deletions(-)
diff --git a/mysql-test/suite/rpl/r/rpl_row_vcol_crash.result b/mysql-test/suite/rpl/r/rpl_row_vcol_crash.result
new file mode 100644
index 00000000000..e2a5d9b65e5
--- /dev/null
+++ b/mysql-test/suite/rpl/r/rpl_row_vcol_crash.result
@@ -0,0 +1,381 @@
+include/master-slave.inc
+[connection master]
+#
+# Test case: KEY on a virtual column with ON DELETE CASCADE
+#
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+INSERT INTO t1 VALUES (1),(2),(3);
+CREATE TABLE t2 (id INT NOT NULL PRIMARY KEY,
+t1_id INT NOT NULL,
+v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col), KEY (t1_id),
+CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+INSERT INTO t2 VALUES (90,1,NULL);
+INSERT INTO t2 VALUES (91,2,default);
+DELETE FROM t1 WHERE id=1;
+connection slave;
+#
+# Verify data consistency on slave
+#
+include/diff_tables.inc [master:test.t1, slave:test.t1]
+include/diff_tables.inc [master:test.t2, slave:test.t2]
+connection master;
+DROP TABLE t2,t1;
+connection slave;
+#
+# Test case: Verify cascading on delete with multiple levels of child
+# tables works fine.
+# Parent table: users
+# Child tables: matchmaking_groups, matchmaking_group_users
+# Parent table: matchmaking_groups
+# Child tables: matchmaking_group_users, matchmaking_group_maps
+#
+# Verify ON DELETE CASCASE works fine when a row is deleted
+# from parent table users
+# matchmaking_groups->matchmaking_group_users->matchmaking_group_maps
+# users->matchmaking_group_users->matchmaking_group_maps
+#
+connection master;
+CREATE TABLE users (id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
+name VARCHAR(32) NOT NULL DEFAULT ''
+) ENGINE=InnoDB DEFAULT CHARSET=utf8;
+CREATE TABLE matchmaking_groups (
+id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
+host_user_id INT UNSIGNED NOT NULL UNIQUE,
+v_col INT AS (host_user_id+1) VIRTUAL, KEY (v_col),
+CONSTRAINT FOREIGN KEY (host_user_id) REFERENCES users (id)
+ON DELETE CASCADE ON UPDATE CASCADE
+) ENGINE=InnoDB DEFAULT CHARSET=utf8;
+CREATE TABLE matchmaking_group_users (
+matchmaking_group_id BIGINT UNSIGNED NOT NULL,
+user_id INT UNSIGNED NOT NULL,
+v_col1 int as (user_id+1) virtual, KEY (v_col1),
+PRIMARY KEY (matchmaking_group_id,user_id),
+UNIQUE KEY user_id (user_id),
+CONSTRAINT FOREIGN KEY (matchmaking_group_id)
+REFERENCES matchmaking_groups (id) ON DELETE CASCADE ON UPDATE CASCADE,
+CONSTRAINT FOREIGN KEY (user_id)
+REFERENCES users (id) ON DELETE CASCADE ON UPDATE CASCADE
+) ENGINE=InnoDB DEFAULT CHARSET=utf8;
+CREATE TABLE matchmaking_group_maps (
+matchmaking_group_id BIGINT UNSIGNED NOT NULL,
+map_id TINYINT UNSIGNED NOT NULL,
+v_col2 INT AS (map_id+1) VIRTUAL, KEY (v_col2),
+PRIMARY KEY (matchmaking_group_id,map_id),
+CONSTRAINT FOREIGN KEY (matchmaking_group_id)
+REFERENCES matchmaking_groups (id) ON DELETE CASCADE ON UPDATE CASCADE
+) ENGINE=InnoDB DEFAULT CHARSET=utf8;
+connection slave;
+connection master;
+INSERT INTO users VALUES (NULL,'foo'),(NULL,'bar');
+INSERT INTO matchmaking_groups VALUES (10,1,default),(11,2,default);
+INSERT INTO matchmaking_group_users VALUES (10,1,default),(11,2,default);
+INSERT INTO matchmaking_group_maps VALUES (10,55,default),(11,66,default);
+DELETE FROM matchmaking_groups WHERE id = 10;
+connection slave;
+#
+# No rows should be returned as ON DELETE CASCASE should have removed
+# corresponding rows from child tables. There should not any mismatch
+# of 'id' field between parent->child.
+#
+SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
+matchmaking_group_id user_id v_col1
+SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
+matchmaking_group_id map_id v_col2
+#
+# Rows with id=11 should be present
+#
+SELECT * FROM matchmaking_group_users;
+matchmaking_group_id user_id v_col1
+11 2 3
+SELECT * FROM matchmaking_group_maps;
+matchmaking_group_id map_id v_col2
+11 66 67
+connection master;
+DELETE FROM users WHERE id = 2;
+connection slave;
+#
+# No rows should be present in both the child tables
+#
+SELECT * FROM matchmaking_group_users;
+matchmaking_group_id user_id v_col1
+SELECT * FROM matchmaking_group_maps;
+matchmaking_group_id map_id v_col2
+connection master;
+DROP TABLE matchmaking_group_maps, matchmaking_group_users, matchmaking_groups, users;
+connection slave;
+#
+# Test case: ON UPDATE CASCADE scenario
+#
+connection master;
+CREATE TABLE t1 (a INT NOT NULL PRIMARY KEY, b INT NOT NULL) ENGINE=InnoDB;
+INSERT INTO t1 VALUES (1, 80);
+CREATE TABLE t2 (a INT KEY, b INT,
+v_col int as (b+1) virtual, KEY (v_col),
+CONSTRAINT b FOREIGN KEY (b) REFERENCES t1(a) ON UPDATE CASCADE
+) ENGINE=InnoDB;
+INSERT INTO t2 VALUES (51, 1, default);
+connection slave;
+connection master;
+UPDATE t1 SET a = 50 WHERE a = 1;
+#
+# Master: Verify that ON UPDATE CASCADE works fine
+# old_row: (51, 1, 2) ON UPDATE New_row: (51, 50, 51)
+#
+SELECT * FROM t2 WHERE b=50;
+a b v_col
+51 50 51
+connection slave;
+#
+# Slave: Verify that ON UPDATE CASCADE works fine
+# old_row: (51, 1, 2) ON UPDATE New_row: (51, 50, 51)
+#
+SELECT * FROM t2 WHERE b=50;
+a b v_col
+51 50 51
+connection master;
+DROP TABLE t2, t1;
+connection slave;
+#
+# Test case: When triggers are defined on master they should be
+# replicated as part of row events and they should be
+# applied on slave with the default
+# slave_run_tiggers_for_rbr=NO
+#
+connection master;
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+CREATE TABLE t2 (count INT NOT NULL) ENGINE=InnoDB;
+CREATE TRIGGER trg AFTER INSERT ON t1 FOR EACH ROW INSERT INTO t2 VALUES (1);
+INSERT INTO t1 VALUES (2),(3);
+connection slave;
+SHOW GLOBAL VARIABLES LIKE 'slave_run_triggers_for_rbr';
+Variable_name Value
+slave_run_triggers_for_rbr NO
+#
+# As two rows are inserted in table 't1', two rows should get inserted
+# into table 't2' as part of trigger.
+#
+include/assert.inc [Table t2 should have two rows.]
+connection master;
+DROP TABLE t1,t2;
+connection slave;
+#
+# Test case: On master create triggers and tables with foreign key
+# relation. Upon replication to slave verify their results
+# are fine. Master and Slave should be in sync.
+#
+connection master;
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+CREATE TABLE t2 (t1_id INT NOT NULL,
+v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col), KEY (t1_id),
+CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+CREATE TABLE t3 (count INT NOT NULL) ENGINE=InnoDB;
+CREATE TRIGGER trg AFTER INSERT ON t1 FOR EACH ROW INSERT INTO t3 VALUES (1);
+INSERT INTO t1 VALUES (2),(3);
+INSERT INTO t2 VALUES (2, default), (3, default);
+connection slave;
+#
+# As two rows are inserted in table 't1', two rows should get inserted
+# into table 't3' as part of trigger.
+#
+include/assert.inc [Table t3 should have two rows.]
+#
+# Verify ON DELETE CASCASE correctness
+#
+connection master;
+DELETE FROM t1 WHERE id=2;
+connection slave;
+connection master;
+include/diff_tables.inc [master:test.t1, slave:test.t1]
+include/diff_tables.inc [master:test.t2, slave:test.t2]
+include/diff_tables.inc [master:test.t3, slave:test.t3]
+DROP TABLE t3,t2,t1;
+connection slave;
+#
+# Test case: Triggers are present only on slave and
+# 'slave_run_triggers_for_rbr=NO'
+#
+connection slave;
+SET @save_slave_run_triggers_for_rbr= @@GLOBAL.slave_run_triggers_for_rbr;
+SET GLOBAL slave_run_triggers_for_rbr= NO;;
+SHOW GLOBAL VARIABLES LIKE '%slave_run_triggers_for_rbr%';
+Variable_name Value
+slave_run_triggers_for_rbr NO
+connection master;
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+CREATE TABLE t2 (t1_id INT NOT NULL,
+v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col),
+KEY (t1_id), CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+CREATE TABLE t3 (count INT NOT NULL) ENGINE=InnoDB;
+connection slave;
+CREATE TRIGGER trg AFTER INSERT ON t2 FOR EACH ROW INSERT INTO t3 VALUES (1);
+connection master;
+INSERT INTO t1 VALUES (2),(3);
+INSERT INTO t2 VALUES (2, default), (3, default);
+connection slave;
+#
+# Count must be 0
+#
+include/assert.inc [Table t3 should have zero rows.]
+connection master;
+DELETE FROM t1 WHERE id=2;
+connection slave;
+SET GLOBAL slave_run_triggers_for_rbr= @save_slave_run_triggers_for_rbr;
+#
+# Verify t1,t2 are consistent on slave.
+#
+include/diff_tables.inc [master:test.t1, slave:test.t1]
+include/diff_tables.inc [master:test.t2, slave:test.t2]
+connection master;
+DROP TABLE t3,t2,t1;
+connection slave;
+#
+# Test case: Triggers are present only on slave and
+# 'slave_run_triggers_for_rbr=YES'
+#
+connection slave;
+SET @save_slave_run_triggers_for_rbr= @@GLOBAL.slave_run_triggers_for_rbr;
+SET GLOBAL slave_run_triggers_for_rbr= YES;;
+SHOW GLOBAL VARIABLES LIKE '%slave_run_triggers_for_rbr%';
+Variable_name Value
+slave_run_triggers_for_rbr YES
+connection master;
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+CREATE TABLE t2 (t1_id INT NOT NULL,
+v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col),
+KEY (t1_id), CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+CREATE TABLE t3 (count INT NOT NULL) ENGINE=InnoDB;
+connection slave;
+CREATE TRIGGER trg AFTER INSERT ON t2 FOR EACH ROW INSERT INTO t3 VALUES (1);
+connection master;
+INSERT INTO t1 VALUES (2),(3);
+INSERT INTO t2 VALUES (2, default), (3, default);
+connection slave;
+#
+# Count must be 2
+#
+include/assert.inc [Table t3 should have two rows.]
+connection master;
+DELETE FROM t1 WHERE id=2;
+connection slave;
+SET GLOBAL slave_run_triggers_for_rbr= @save_slave_run_triggers_for_rbr;
+#
+# Verify t1,t2 are consistent on slave.
+#
+include/diff_tables.inc [master:test.t1, slave:test.t1]
+include/diff_tables.inc [master:test.t2, slave:test.t2]
+connection master;
+DROP TABLE t3,t2,t1;
+connection slave;
+#
+# Test case: Triggers and Foreign Keys are present only on slave and
+# 'slave_run_triggers_for_rbr=NO'
+#
+connection slave;
+SET @save_slave_run_triggers_for_rbr= @@GLOBAL.slave_run_triggers_for_rbr;
+SET GLOBAL slave_run_triggers_for_rbr= NO;;
+SHOW GLOBAL VARIABLES LIKE '%slave_run_triggers_for_rbr%';
+Variable_name Value
+slave_run_triggers_for_rbr NO
+connection master;
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+SET sql_log_bin=0;
+CREATE TABLE t2 (t1_id INT NOT NULL,v_col INT AS (t1_id+1) VIRTUAL) ENGINE=INNODB;
+SET sql_log_bin=1;
+CREATE TABLE t3 (count INT NOT NULL) ENGINE=InnoDB;
+connection slave;
+CREATE TABLE t2 (t1_id INT NOT NULL,
+v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col), KEY (t1_id),
+CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+CREATE TRIGGER trg AFTER INSERT ON t2 FOR EACH ROW INSERT INTO t3 VALUES (1);
+connection master;
+INSERT INTO t1 VALUES (2),(3);
+INSERT INTO t2 VALUES (2, default), (3, default);
+connection slave;
+#
+# Count must be 0
+#
+include/assert.inc [Table t3 should have zero rows.]
+connection master;
+DELETE FROM t1 WHERE id=2;
+# t1: Should have one row
+SELECT * FROM t1;
+id
+3
+# t2: Should have two rows
+SELECT * FROM t2;
+t1_id v_col
+2 3
+3 4
+connection slave;
+# t1: Should have one row
+SELECT * FROM t1;
+id
+3
+# t2: Should have one row on slave due to ON DELETE CASCASE
+SELECT * FROM t2;
+t1_id v_col
+3 4
+SET GLOBAL slave_run_triggers_for_rbr= @save_slave_run_triggers_for_rbr;
+connection master;
+DROP TABLE t3,t2,t1;
+connection slave;
+#
+# Test case: Triggers are Foreign Keys are present only on slave and
+# 'slave_run_triggers_for_rbr=YES'
+#
+connection slave;
+SET @save_slave_run_triggers_for_rbr= @@GLOBAL.slave_run_triggers_for_rbr;
+SET GLOBAL slave_run_triggers_for_rbr= YES;;
+SHOW GLOBAL VARIABLES LIKE '%slave_run_triggers_for_rbr%';
+Variable_name Value
+slave_run_triggers_for_rbr YES
+connection master;
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+SET sql_log_bin=0;
+CREATE TABLE t2 (t1_id INT NOT NULL,v_col INT AS (t1_id+1) VIRTUAL) ENGINE=INNODB;
+SET sql_log_bin=1;
+CREATE TABLE t3 (count INT NOT NULL) ENGINE=InnoDB;
+connection slave;
+CREATE TABLE t2 (t1_id INT NOT NULL,
+v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col), KEY (t1_id),
+CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+CREATE TRIGGER trg AFTER INSERT ON t2 FOR EACH ROW INSERT INTO t3 VALUES (1);
+connection master;
+INSERT INTO t1 VALUES (2),(3);
+INSERT INTO t2 VALUES (2, default), (3, default);
+connection slave;
+#
+# Count must be 2
+#
+include/assert.inc [Table t3 should have two rows.]
+connection master;
+DELETE FROM t1 WHERE id=2;
+# t1: Should have one row
+SELECT * FROM t1;
+id
+3
+# t2: Should have two rows
+SELECT * FROM t2;
+t1_id v_col
+2 3
+3 4
+connection slave;
+# t1: Should have one row
+SELECT * FROM t1;
+id
+3
+# t2: Should have one row on slave due to ON DELETE CASCASE
+SELECT * FROM t2;
+t1_id v_col
+3 4
+SET GLOBAL slave_run_triggers_for_rbr= @save_slave_run_triggers_for_rbr;
+connection master;
+DROP TABLE t3,t2,t1;
+connection slave;
+include/rpl_end.inc
diff --git a/mysql-test/suite/rpl/t/rpl_row_vcol_crash.test b/mysql-test/suite/rpl/t/rpl_row_vcol_crash.test
new file mode 100644
index 00000000000..2ed7c219aeb
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_row_vcol_crash.test
@@ -0,0 +1,410 @@
+# ==== Purpose ====
+#
+# Test verifies that, slave doesn't report any assert on UPDATE or DELETE of
+# row which tries to update the virtual columns with associated KEYs.
+#
+# ==== References ====
+#
+# MDEV-23033: All slaves crash once in ~24 hours and loop restart with signal 11
+#
+
+--source include/have_binlog_format_row.inc
+--source include/have_innodb.inc
+--source include/master-slave.inc
+
+
+--echo #
+--echo # Test case: KEY on a virtual column with ON DELETE CASCADE
+--echo #
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+INSERT INTO t1 VALUES (1),(2),(3);
+
+CREATE TABLE t2 (id INT NOT NULL PRIMARY KEY,
+ t1_id INT NOT NULL,
+ v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col), KEY (t1_id),
+ CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+
+INSERT INTO t2 VALUES (90,1,NULL);
+INSERT INTO t2 VALUES (91,2,default);
+
+# Following query results in an assert on slave
+DELETE FROM t1 WHERE id=1;
+--sync_slave_with_master
+
+--echo #
+--echo # Verify data consistency on slave
+--echo #
+--let $diff_tables= master:test.t1, slave:test.t1
+--source include/diff_tables.inc
+--let $diff_tables= master:test.t2, slave:test.t2
+--source include/diff_tables.inc
+
+--connection master
+DROP TABLE t2,t1;
+--sync_slave_with_master
+
+--echo #
+--echo # Test case: Verify cascading on delete with multiple levels of child
+--echo # tables works fine.
+--echo # Parent table: users
+--echo # Child tables: matchmaking_groups, matchmaking_group_users
+--echo # Parent table: matchmaking_groups
+--echo # Child tables: matchmaking_group_users, matchmaking_group_maps
+--echo #
+--echo # Verify ON DELETE CASCASE works fine when a row is deleted
+--echo # from parent table users
+--echo # matchmaking_groups->matchmaking_group_users->matchmaking_group_maps
+--echo # users->matchmaking_group_users->matchmaking_group_maps
+--echo #
+
+--connection master
+CREATE TABLE users (id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
+ name VARCHAR(32) NOT NULL DEFAULT ''
+) ENGINE=InnoDB DEFAULT CHARSET=utf8;
+
+CREATE TABLE matchmaking_groups (
+ id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
+ host_user_id INT UNSIGNED NOT NULL UNIQUE,
+ v_col INT AS (host_user_id+1) VIRTUAL, KEY (v_col),
+ CONSTRAINT FOREIGN KEY (host_user_id) REFERENCES users (id)
+ ON DELETE CASCADE ON UPDATE CASCADE
+) ENGINE=InnoDB DEFAULT CHARSET=utf8;
+
+CREATE TABLE matchmaking_group_users (
+ matchmaking_group_id BIGINT UNSIGNED NOT NULL,
+ user_id INT UNSIGNED NOT NULL,
+ v_col1 int as (user_id+1) virtual, KEY (v_col1),
+ PRIMARY KEY (matchmaking_group_id,user_id),
+ UNIQUE KEY user_id (user_id),
+ CONSTRAINT FOREIGN KEY (matchmaking_group_id)
+ REFERENCES matchmaking_groups (id) ON DELETE CASCADE ON UPDATE CASCADE,
+ CONSTRAINT FOREIGN KEY (user_id)
+ REFERENCES users (id) ON DELETE CASCADE ON UPDATE CASCADE
+) ENGINE=InnoDB DEFAULT CHARSET=utf8;
+
+CREATE TABLE matchmaking_group_maps (
+ matchmaking_group_id BIGINT UNSIGNED NOT NULL,
+ map_id TINYINT UNSIGNED NOT NULL,
+ v_col2 INT AS (map_id+1) VIRTUAL, KEY (v_col2),
+ PRIMARY KEY (matchmaking_group_id,map_id),
+ CONSTRAINT FOREIGN KEY (matchmaking_group_id)
+ REFERENCES matchmaking_groups (id) ON DELETE CASCADE ON UPDATE CASCADE
+) ENGINE=InnoDB DEFAULT CHARSET=utf8;
+--sync_slave_with_master
+
+--connection master
+INSERT INTO users VALUES (NULL,'foo'),(NULL,'bar');
+INSERT INTO matchmaking_groups VALUES (10,1,default),(11,2,default);
+INSERT INTO matchmaking_group_users VALUES (10,1,default),(11,2,default);
+INSERT INTO matchmaking_group_maps VALUES (10,55,default),(11,66,default);
+
+DELETE FROM matchmaking_groups WHERE id = 10;
+--sync_slave_with_master
+
+--echo #
+--echo # No rows should be returned as ON DELETE CASCASE should have removed
+--echo # corresponding rows from child tables. There should not any mismatch
+--echo # of 'id' field between parent->child.
+--echo #
+SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
+SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
+
+--echo #
+--echo # Rows with id=11 should be present
+--echo #
+SELECT * FROM matchmaking_group_users;
+SELECT * FROM matchmaking_group_maps;
+
+--connection master
+DELETE FROM users WHERE id = 2;
+--sync_slave_with_master
+
+--echo #
+--echo # No rows should be present in both the child tables
+--echo #
+SELECT * FROM matchmaking_group_users;
+SELECT * FROM matchmaking_group_maps;
+
+--connection master
+DROP TABLE matchmaking_group_maps, matchmaking_group_users, matchmaking_groups, users;
+--sync_slave_with_master
+
+--echo #
+--echo # Test case: ON UPDATE CASCADE scenario
+--echo #
+
+--connection master
+CREATE TABLE t1 (a INT NOT NULL PRIMARY KEY, b INT NOT NULL) ENGINE=InnoDB;
+INSERT INTO t1 VALUES (1, 80);
+
+CREATE TABLE t2 (a INT KEY, b INT,
+ v_col int as (b+1) virtual, KEY (v_col),
+ CONSTRAINT b FOREIGN KEY (b) REFERENCES t1(a) ON UPDATE CASCADE
+) ENGINE=InnoDB;
+INSERT INTO t2 VALUES (51, 1, default);
+--sync_slave_with_master
+
+--connection master
+UPDATE t1 SET a = 50 WHERE a = 1;
+
+--echo #
+--echo # Master: Verify that ON UPDATE CASCADE works fine
+--echo # old_row: (51, 1, 2) ON UPDATE New_row: (51, 50, 51)
+--echo #
+SELECT * FROM t2 WHERE b=50;
+--sync_slave_with_master
+
+--echo #
+--echo # Slave: Verify that ON UPDATE CASCADE works fine
+--echo # old_row: (51, 1, 2) ON UPDATE New_row: (51, 50, 51)
+--echo #
+SELECT * FROM t2 WHERE b=50;
+
+--connection master
+DROP TABLE t2, t1;
+--sync_slave_with_master
+
+--echo #
+--echo # Test case: When triggers are defined on master they should be
+--echo # replicated as part of row events and they should be
+--echo # applied on slave with the default
+--echo # slave_run_tiggers_for_rbr=NO
+--echo #
+
+# In row-based replication, the binary log contains row changes. It will have
+# both the changes made by the statement itself, and the changes made by the
+# triggers that were invoked by the statement. Slave server(s) do not need to
+# run triggers for row changes they are applying. Hence verify that this
+# property remains the same and data should be available as if trigger was
+# executed. Please note by default slave_run_tiggers_for_rbr=NO.
+
+--connection master
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+CREATE TABLE t2 (count INT NOT NULL) ENGINE=InnoDB;
+CREATE TRIGGER trg AFTER INSERT ON t1 FOR EACH ROW INSERT INTO t2 VALUES (1);
+INSERT INTO t1 VALUES (2),(3);
+--sync_slave_with_master
+
+SHOW GLOBAL VARIABLES LIKE 'slave_run_triggers_for_rbr';
+--echo #
+--echo # As two rows are inserted in table 't1', two rows should get inserted
+--echo # into table 't2' as part of trigger.
+--echo #
+--let $assert_cond= COUNT(*) = 2 FROM t2
+--let $assert_text= Table t2 should have two rows.
+--source include/assert.inc
+
+--connection master
+DROP TABLE t1,t2;
+--sync_slave_with_master
+
+--echo #
+--echo # Test case: On master create triggers and tables with foreign key
+--echo # relation. Upon replication to slave verify their results
+--echo # are fine. Master and Slave should be in sync.
+--echo #
+--connection master
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+CREATE TABLE t2 (t1_id INT NOT NULL,
+ v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col), KEY (t1_id),
+ CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+CREATE TABLE t3 (count INT NOT NULL) ENGINE=InnoDB;
+CREATE TRIGGER trg AFTER INSERT ON t1 FOR EACH ROW INSERT INTO t3 VALUES (1);
+
+INSERT INTO t1 VALUES (2),(3);
+INSERT INTO t2 VALUES (2, default), (3, default);
+--sync_slave_with_master
+
+--echo #
+--echo # As two rows are inserted in table 't1', two rows should get inserted
+--echo # into table 't3' as part of trigger.
+--echo #
+--let $assert_cond= COUNT(*) = 2 FROM t3
+--let $assert_text= Table t3 should have two rows.
+--source include/assert.inc
+
+--echo #
+--echo # Verify ON DELETE CASCASE correctness
+--echo #
+--connection master
+DELETE FROM t1 WHERE id=2;
+--sync_slave_with_master
+
+--connection master
+--let $diff_tables= master:test.t1, slave:test.t1
+--source include/diff_tables.inc
+--let $diff_tables= master:test.t2, slave:test.t2
+--source include/diff_tables.inc
+--let $diff_tables= master:test.t3, slave:test.t3
+--source include/diff_tables.inc
+
+DROP TABLE t3,t2,t1;
+--sync_slave_with_master
+
+#
+# Test case: Triggers only on slave
+#
+--write_file $MYSQLTEST_VARDIR/tmp/trig_on_slave.inc PROCEDURE
+ if ($slave_run_triggers_for_rbr == '') {
+ --die !!!ERROR IN TEST: you must set $slave_run_triggers_for_rbr
+ }
+
+--connection slave
+SET @save_slave_run_triggers_for_rbr= @@GLOBAL.slave_run_triggers_for_rbr;
+--eval SET GLOBAL slave_run_triggers_for_rbr= $slave_run_triggers_for_rbr;
+SHOW GLOBAL VARIABLES LIKE '%slave_run_triggers_for_rbr%';
+
+--connection master
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+CREATE TABLE t2 (t1_id INT NOT NULL,
+ v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col),
+ KEY (t1_id), CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+CREATE TABLE t3 (count INT NOT NULL) ENGINE=InnoDB;
+--sync_slave_with_master
+
+CREATE TRIGGER trg AFTER INSERT ON t2 FOR EACH ROW INSERT INTO t3 VALUES (1);
+
+--connection master
+INSERT INTO t1 VALUES (2),(3);
+INSERT INTO t2 VALUES (2, default), (3, default);
+--sync_slave_with_master
+
+if ($slave_run_triggers_for_rbr == 'NO') {
+--echo #
+--echo # Count must be 0
+--echo #
+--let $assert_cond= COUNT(*) = 0 FROM t3
+--let $assert_text= Table t3 should have zero rows.
+--source include/assert.inc
+}
+if ($slave_run_triggers_for_rbr == 'YES') {
+--echo #
+--echo # Count must be 2
+--echo #
+--let $assert_cond= COUNT(*) = 2 FROM t3
+--let $assert_text= Table t3 should have two rows.
+--source include/assert.inc
+}
+
+--connection master
+DELETE FROM t1 WHERE id=2;
+--sync_slave_with_master
+SET GLOBAL slave_run_triggers_for_rbr= @save_slave_run_triggers_for_rbr;
+
+--echo #
+--echo # Verify t1,t2 are consistent on slave.
+--echo #
+--let $diff_tables= master:test.t1, slave:test.t1
+--source include/diff_tables.inc
+--let $diff_tables= master:test.t2, slave:test.t2
+--source include/diff_tables.inc
+
+--connection master
+DROP TABLE t3,t2,t1;
+--sync_slave_with_master
+#END OF
+PROCEDURE
+
+--echo #
+--echo # Test case: Triggers are present only on slave and
+--echo # 'slave_run_triggers_for_rbr=NO'
+--echo #
+--let $slave_run_triggers_for_rbr=NO
+--source $MYSQLTEST_VARDIR/tmp/trig_on_slave.inc
+
+--echo #
+--echo # Test case: Triggers are present only on slave and
+--echo # 'slave_run_triggers_for_rbr=YES'
+--echo #
+--let $slave_run_triggers_for_rbr=YES
+--source $MYSQLTEST_VARDIR/tmp/trig_on_slave.inc
+--remove_file $MYSQLTEST_VARDIR/tmp/trig_on_slave.inc
+
+#
+# Test case: Trigger and Foreign Key are present only on slave
+#
+--write_file $MYSQLTEST_VARDIR/tmp/trig_fk_on_slave.inc PROCEDURE
+ if ($slave_run_triggers_for_rbr == '') {
+ --die !!!ERROR IN TEST: you must set $slave_run_triggers_for_rbr
+ }
+
+--connection slave
+SET @save_slave_run_triggers_for_rbr= @@GLOBAL.slave_run_triggers_for_rbr;
+--eval SET GLOBAL slave_run_triggers_for_rbr= $slave_run_triggers_for_rbr;
+SHOW GLOBAL VARIABLES LIKE '%slave_run_triggers_for_rbr%';
+
+--connection master
+CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
+SET sql_log_bin=0;
+CREATE TABLE t2 (t1_id INT NOT NULL,v_col INT AS (t1_id+1) VIRTUAL) ENGINE=INNODB;
+SET sql_log_bin=1;
+CREATE TABLE t3 (count INT NOT NULL) ENGINE=InnoDB;
+--sync_slave_with_master
+
+# Have foreign key and trigger on slave.
+CREATE TABLE t2 (t1_id INT NOT NULL,
+ v_col INT AS (t1_id+1) VIRTUAL, KEY (v_col), KEY (t1_id),
+ CONSTRAINT a FOREIGN KEY (t1_id) REFERENCES t1 (id) ON DELETE CASCADE
+) ENGINE=InnoDB;
+CREATE TRIGGER trg AFTER INSERT ON t2 FOR EACH ROW INSERT INTO t3 VALUES (1);
+
+--connection master
+INSERT INTO t1 VALUES (2),(3);
+INSERT INTO t2 VALUES (2, default), (3, default);
+--sync_slave_with_master
+
+if ($slave_run_triggers_for_rbr == 'NO') {
+--echo #
+--echo # Count must be 0
+--echo #
+--let $assert_cond= COUNT(*) = 0 FROM t3
+--let $assert_text= Table t3 should have zero rows.
+--source include/assert.inc
+}
+if ($slave_run_triggers_for_rbr == 'YES') {
+--echo #
+--echo # Count must be 2
+--echo #
+--let $assert_cond= COUNT(*) = 2 FROM t3
+--let $assert_text= Table t3 should have two rows.
+--source include/assert.inc
+}
+
+--connection master
+DELETE FROM t1 WHERE id=2;
+--echo # t1: Should have one row
+SELECT * FROM t1;
+--echo # t2: Should have two rows
+SELECT * FROM t2;
+--sync_slave_with_master
+--echo # t1: Should have one row
+SELECT * FROM t1;
+--echo # t2: Should have one row on slave due to ON DELETE CASCASE
+SELECT * FROM t2;
+SET GLOBAL slave_run_triggers_for_rbr= @save_slave_run_triggers_for_rbr;
+
+--connection master
+DROP TABLE t3,t2,t1;
+--sync_slave_with_master
+#END OF
+PROCEDURE
+
+--echo #
+--echo # Test case: Triggers and Foreign Keys are present only on slave and
+--echo # 'slave_run_triggers_for_rbr=NO'
+--echo #
+--let $slave_run_triggers_for_rbr=NO
+--source $MYSQLTEST_VARDIR/tmp/trig_fk_on_slave.inc
+
+--echo #
+--echo # Test case: Triggers are Foreign Keys are present only on slave and
+--echo # 'slave_run_triggers_for_rbr=YES'
+--echo #
+--let $slave_run_triggers_for_rbr=YES
+--source $MYSQLTEST_VARDIR/tmp/trig_fk_on_slave.inc
+--remove_file $MYSQLTEST_VARDIR/tmp/trig_fk_on_slave.inc
+
+--source include/rpl_end.inc
diff --git a/sql/log_event.cc b/sql/log_event.cc
index c649e1f64fa..10aa0afbad8 100644
--- a/sql/log_event.cc
+++ b/sql/log_event.cc
@@ -10718,7 +10718,7 @@ int Rows_log_event::do_add_row_data(uchar *row_data, size_t length)
There was the same problem with MERGE MYISAM tables and so here we try to
go the same way.
*/
-static void restore_empty_query_table_list(LEX *lex)
+inline void restore_empty_query_table_list(LEX *lex)
{
if (lex->first_not_own_table())
(*lex->first_not_own_table()->prev_global)= NULL;
@@ -10733,6 +10733,8 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi)
TABLE* table;
DBUG_ENTER("Rows_log_event::do_apply_event(Relay_log_info*)");
int error= 0;
+ LEX *lex= thd->lex;
+ uint8 new_trg_event_map= get_trg_event_map();
/*
If m_table_id == ~0ULL, then we have a dummy event that does not
contain any data. In that case, we just remove all tables in the
@@ -10823,27 +10825,25 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi)
DBUG_ASSERT(!debug_sync_set_action(thd, STRING_WITH_LEN(action)));
};);
- if (slave_run_triggers_for_rbr)
- {
- LEX *lex= thd->lex;
- uint8 new_trg_event_map= get_trg_event_map();
-
- /*
- Trigger's procedures work with global table list. So we have to add
- rgi->tables_to_lock content there to get trigger's in the list.
+ /*
+ Trigger's procedures work with global table list. So we have to add
+ rgi->tables_to_lock content there to get trigger's in the list.
- Then restore_empty_query_table_list() restore the list as it was
- */
- DBUG_ASSERT(lex->query_tables == NULL);
- if ((lex->query_tables= rgi->tables_to_lock))
- rgi->tables_to_lock->prev_global= &lex->query_tables;
+ Then restore_empty_query_table_list() restore the list as it was
+ */
+ DBUG_ASSERT(lex->query_tables == NULL);
+ if ((lex->query_tables= rgi->tables_to_lock))
+ rgi->tables_to_lock->prev_global= &lex->query_tables;
- for (TABLE_LIST *tables= rgi->tables_to_lock; tables;
- tables= tables->next_global)
- {
+ for (TABLE_LIST *tables= rgi->tables_to_lock; tables;
+ tables= tables->next_global)
+ {
+ if (slave_run_triggers_for_rbr)
tables->trg_event_map= new_trg_event_map;
- lex->query_tables_last= &tables->next_global;
- }
+ else
+ tables->slave_fk_event_map= new_trg_event_map;
+
+ lex->query_tables_last= &tables->next_global;
}
if (open_and_lock_tables(thd, rgi->tables_to_lock, FALSE, 0))
{
@@ -11193,8 +11193,7 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi)
}
/* remove trigger's tables */
- if (slave_run_triggers_for_rbr)
- restore_empty_query_table_list(thd->lex);
+ restore_empty_query_table_list(thd->lex);
#if defined(WITH_WSREP) && defined(HAVE_QUERY_CACHE)
if (WSREP(thd) && thd->wsrep_exec_mode == REPL_RECV)
@@ -11212,8 +11211,7 @@ int Rows_log_event::do_apply_event(rpl_group_info *rgi)
DBUG_RETURN(error);
err:
- if (slave_run_triggers_for_rbr)
- restore_empty_query_table_list(thd->lex);
+ restore_empty_query_table_list(thd->lex);
rgi->slave_close_thread_tables(thd);
DBUG_RETURN(error);
}
diff --git a/sql/sql_base.cc b/sql/sql_base.cc
index 8e57ea437b6..a7cddbca16a 100644
--- a/sql/sql_base.cc
+++ b/sql/sql_base.cc
@@ -4371,7 +4371,8 @@ handle_table(THD *thd, Query_tables_list *prelocking_ctx,
/* We rely on a caller to check that table is going to be changed. */
DBUG_ASSERT(table_list->lock_type >= TL_WRITE_ALLOW_WRITE);
- if (table_list->trg_event_map)
+ if (table_list->trg_event_map ||
+ (thd->rgi_slave && thd->is_current_stmt_binlog_format_row()))
{
if (table_list->table->triggers)
{
@@ -4405,7 +4406,8 @@ handle_table(THD *thd, Query_tables_list *prelocking_ctx,
{
// FK_OPTION_RESTRICT and FK_OPTION_NO_ACTION only need read access
static bool can_write[]= { true, false, true, true, false, true };
- uint8 op= table_list->trg_event_map;
+ uint8 op= ((table_list->trg_event_map) ? table_list->trg_event_map :
+ table_list->slave_fk_event_map);
thr_lock_type lock_type;
if ((op & (1 << TRG_EVENT_DELETE) && can_write[fk->delete_method])
diff --git a/sql/table.h b/sql/table.h
index 9a864f7ce9f..57706655d9b 100644
--- a/sql/table.h
+++ b/sql/table.h
@@ -2277,8 +2277,12 @@ struct TABLE_LIST
Indicates what triggers we need to pre-load for this TABLE_LIST
when opening an associated TABLE. This is filled after
the parsed tree is created.
+
+ slave_fk_event_map is filled on the slave side with bitmaps value
+ representing row-based event operation to help find and prelock
+ possible FK constrain-related child tables.
*/
- uint8 trg_event_map;
+ uint8 trg_event_map, slave_fk_event_map;
/* TRUE <=> this table is a const one and was optimized away. */
bool optimized_away;
1
0
Re: [Maria-developers] [Commits] cdc305c8dd8: MDEV-19620: Changing join_buffer_size causes different results
by Sergey Petrunia 29 Dec '20
by Sergey Petrunia 29 Dec '20
29 Dec '20
Hi Varun,
Please find input below.
On Mon, Dec 28, 2020 at 02:34:40PM +0530, varun wrote:
> revision-id: cdc305c8dd89a726e09e5fe70ff890d06609cbfb (mariadb-10.3.21-309-gcdc305c8dd8)
> parent(s): 043bd85a574a88856ab9c6d497e682ed06fe45e9
> author: Varun Gupta
> committer: Varun Gupta
> timestamp: 2020-12-28 14:12:14 +0530
> message:
>
> MDEV-19620: Changing join_buffer_size causes different results
>
> The scenario here is that query refinement phase decides to use a hash join.
> When the join buffers are allocated in the JOIN::init_join_caches, for a table
> the size exceeds the value for join_buffer_space_limit (which is the limit of the
> space available for all join buffers). When this happens then we disallow join
> buffering for the table, this is done in the revise_cache_usage and set_join_cache_denial.
>
> In this issue the hash key is created on an index for which ref access is possible, so
> when we disallow hash join then instead of switching to REF access we switch to a table
> scan. This is a problem because the equijoin conditions for which a lookup can be made
> are not attached to the table(or are not evaluated for the table). This leads to incorrect
> results.
>
> The fix here would be to switch to using a lookup because it was picked by the join planner
> to be more efficient than the table scan.
>
> ---
> mysql-test/main/join_cache.result | 138 ++++++++++++++++++++++++++++++++++++++
> mysql-test/main/join_cache.test | 105 +++++++++++++++++++++++++++++
> sql/sql_select.cc | 69 +++++++++++++++----
> sql/sql_select.h | 6 ++
> 4 files changed, 304 insertions(+), 14 deletions(-)
>
> diff --git a/mysql-test/main/join_cache.result b/mysql-test/main/join_cache.result
> index 3d1d91df997..e58503f422f 100644
> --- a/mysql-test/main/join_cache.result
> +++ b/mysql-test/main/join_cache.result
> @@ -6128,4 +6128,142 @@ EXPLAIN
> }
> }
> drop table t1,t2,t3;
> +#
> +# MDEV-19620: Changing join_buffer_size causes different results
> +#
> +SET @save_join_cache_level= @@join_cache_level;
> +SET @save_join_buffer_size= @@join_buffer_size;
> +SET @save_join_buffer_space_limit= @@join_buffer_space_limit;
> +SET join_cache_level = 3;
> +CREATE TABLE t1 (i2 VARCHAR(500), c2 VARCHAR(500)) ENGINE=MyISAM;
> +INSERT INTO t1 VALUES ('1','v'),('7','s');
> +CREATE TABLE t2 (pk3 INT PRIMARY KEY, i3 VARCHAR(300), c3 VARCHAR(500)) ENGINE=MyISAM;
> +INSERT INTO t2 VALUES (1,'7','g'),(2,'4','p'),(3,'1','q');
> +INSERT INTO t2 VALUES (4,'7','g'),(5,'4','p'),(6,'1','q');
> +INSERT INTO t2 VALUES (16,'7','g'),(17,'4','p'),(28,'1','q');
> +#
> +# Hash join + table Scan on t2
> +#
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2
> +1 SIMPLE t2 ALL NULL NULL NULL NULL 9 Using where
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +i2 c2 pk3 i3 c3
> +1 v NULL NULL NULL
> +7 s NULL NULL NULL
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2
> +1 SIMPLE t2 hash_ALL NULL #hash#$hj 503 test.t1.c2 9 Using where; Using join buffer (flat, BNLH join)
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +i2 c2 pk3 i3 c3
> +1 v NULL NULL NULL
> +7 s NULL NULL NULL
> +#
> +# HASH join + ref access on t2
> +#
> +ALTER TABLE t2 ADD KEY k1(c3);
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2
> +1 SIMPLE t2 ref k1 k1 503 test.t1.c2 2 Using where
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +i2 c2 pk3 i3 c3
> +1 v NULL NULL NULL
> +7 s NULL NULL NULL
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2
> +1 SIMPLE t2 hash_ALL k1 #hash#k1 503 test.t1.c2 9 Using where; Using join buffer (flat, BNLH join)
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +i2 c2 pk3 i3 c3
> +1 v NULL NULL NULL
> +7 s NULL NULL NULL
> +#
> +# Hash join + index scan on t2
> +#
> +ALTER TABLE t2 DROP KEY k1;
> +ALTER TABLE t2 ADD KEY k1(i3,c3);
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +EXPLAIN SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2
> +1 SIMPLE t2 index NULL k1 806 NULL 9 Using where; Using index
> +SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +c3 i3 i2 c2
> +NULL NULL 1 v
> +NULL NULL 7 s
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2
> +1 SIMPLE t2 hash_index NULL #hash#$hj:k1 503:806 test.t1.c2 9 Using where; Using index; Using join buffer (flat, BNLH join)
> +SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +c3 i3 i2 c2
> +NULL NULL 1 v
> +NULL NULL 7 s
> +DROP TABLE t1,t2;
> +#
> +# Hash join + range scan on t2
> +#
> +CREATE TABLE t1 (i2 VARCHAR(500), c2 VARCHAR(500));
> +INSERT INTO t1 VALUES ('1','v'),('7','s');
> +CREATE TABLE t2 (pk3 INT PRIMARY KEY, i3 VARCHAR(300), c3 VARCHAR(500), INDEX(i3,c3));
> +INSERT INTO t2 VALUES (1,'7','g'),(2,'4','s'),(3,'1','q');
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3 = c2) WHERE i3 >= '4';
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2 Using where
> +1 SIMPLE t2 range i3 i3 303 NULL 2 Using index condition; Using where
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3 = c2) WHERE i3 >= '4';
> +i2 c2 pk3 i3 c3
> +7 s 2 4 s
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3 = c2) WHERE i3 >= '4';
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2 Using where
> +1 SIMPLE t2 hash_range i3 #hash#$hj:i3 503:303 test.t1.c2 2 Using where; Using join buffer (flat, BNLH join)
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3 = c2) WHERE i3 >= '4';
> +i2 c2 pk3 i3 c3
> +7 s 2 4 s
> +DROP TABLE t1,t2;
> +#
> +# Hash join + eq ref access on t2
> +#
> +CREATE TABLE t1 (i2 VARCHAR(500), c2 VARCHAR(500));
> +INSERT INTO t1 VALUES ('1','v'),('7','s');
> +CREATE TABLE t2 (pk3 INT, i3 VARCHAR(300), c3 VARCHAR(500) PRIMARY KEY);
> +INSERT INTO t2 VALUES (1,'7','g'),(2,'4','s'),(3,'1','q');
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +EXPLAIN SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2
> +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 502 test.t1.c2 1 Using where
> +SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +c3 i3 i2 c2
> +NULL NULL 1 v
> +s 4 7 s
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t1 ALL NULL NULL NULL NULL 2
> +1 SIMPLE t2 hash_ALL PRIMARY #hash#PRIMARY 502 test.t1.c2 3 Using where; Using join buffer (flat, BNLH join)
> +SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +c3 i3 i2 c2
> +NULL NULL 1 v
> +s 4 7 s
> +DROP TABLE t1,t2;
> +SET @save_join_cache_level= @@join_cache_level;
> +SET @save_join_buffer_size= @@join_buffer_size;
> +SET @save_join_buffer_space_limit= @@join_buffer_space_limit;
> set @@optimizer_switch=@save_optimizer_switch;
> diff --git a/mysql-test/main/join_cache.test b/mysql-test/main/join_cache.test
> index 91339c2cb21..6670c62516b 100644
> --- a/mysql-test/main/join_cache.test
> +++ b/mysql-test/main/join_cache.test
> @@ -4054,5 +4054,110 @@ where
>
> drop table t1,t2,t3;
>
> +--echo #
> +--echo # MDEV-19620: Changing join_buffer_size causes different results
> +--echo #
> +
> +SET @save_join_cache_level= @@join_cache_level;
> +SET @save_join_buffer_size= @@join_buffer_size;
> +SET @save_join_buffer_space_limit= @@join_buffer_space_limit;
> +SET join_cache_level = 3;
> +
> +CREATE TABLE t1 (i2 VARCHAR(500), c2 VARCHAR(500)) ENGINE=MyISAM;
> +INSERT INTO t1 VALUES ('1','v'),('7','s');
> +CREATE TABLE t2 (pk3 INT PRIMARY KEY, i3 VARCHAR(300), c3 VARCHAR(500)) ENGINE=MyISAM;
> +INSERT INTO t2 VALUES (1,'7','g'),(2,'4','p'),(3,'1','q');
> +INSERT INTO t2 VALUES (4,'7','g'),(5,'4','p'),(6,'1','q');
> +INSERT INTO t2 VALUES (16,'7','g'),(17,'4','p'),(28,'1','q');
> +
> +--echo #
> +--echo # Hash join + table Scan on t2
> +--echo #
> +
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +
> +--echo #
> +--echo # HASH join + ref access on t2
> +--echo #
> +
> +ALTER TABLE t2 ADD KEY k1(c3);
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3=c2);
> +
> +--echo #
> +--echo # Hash join + index scan on t2
> +--echo #
> +ALTER TABLE t2 DROP KEY k1;
> +ALTER TABLE t2 ADD KEY k1(i3,c3);
> +
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +
> +EXPLAIN SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +DROP TABLE t1,t2;
> +
> +--echo #
> +--echo # Hash join + range scan on t2
> +--echo #
> +
> +CREATE TABLE t1 (i2 VARCHAR(500), c2 VARCHAR(500));
> +INSERT INTO t1 VALUES ('1','v'),('7','s');
> +CREATE TABLE t2 (pk3 INT PRIMARY KEY, i3 VARCHAR(300), c3 VARCHAR(500), INDEX(i3,c3));
> +INSERT INTO t2 VALUES (1,'7','g'),(2,'4','s'),(3,'1','q');
> +
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3 = c2) WHERE i3 >= '4';
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3 = c2) WHERE i3 >= '4';
> +
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON (c3 = c2) WHERE i3 >= '4';
> +SELECT * FROM t1 LEFT JOIN t2 ON (c3 = c2) WHERE i3 >= '4';
> +
> +DROP TABLE t1,t2;
> +
> +--echo #
> +--echo # Hash join + eq ref access on t2
> +--echo #
> +
> +CREATE TABLE t1 (i2 VARCHAR(500), c2 VARCHAR(500));
> +INSERT INTO t1 VALUES ('1','v'),('7','s');
> +CREATE TABLE t2 (pk3 INT, i3 VARCHAR(300), c3 VARCHAR(500) PRIMARY KEY);
> +INSERT INTO t2 VALUES (1,'7','g'),(2,'4','s'),(3,'1','q');
> +
> +set join_buffer_size=1024;
> +set join_buffer_space_limit=2048;
> +
> +EXPLAIN SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +
> +set join_buffer_space_limit=262144;
> +EXPLAIN SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +SELECT c3,i3, i2,c2 FROM t1 LEFT JOIN t2 ON (c3 = c2);
> +DROP TABLE t1,t2;
> +
> +SET @save_join_cache_level= @@join_cache_level;
> +SET @save_join_buffer_size= @@join_buffer_size;
> +SET @save_join_buffer_space_limit= @@join_buffer_space_limit;
> +
> # The following command must be the last one in the file
> set @@optimizer_switch=@save_optimizer_switch;
> diff --git a/sql/sql_select.cc b/sql/sql_select.cc
> index 203422f0b43..8b7deb98dc6 100644
> --- a/sql/sql_select.cc
> +++ b/sql/sql_select.cc
> @@ -111,6 +111,7 @@ static bool best_extension_by_limited_search(JOIN *join,
> uint prune_level,
> uint use_cond_selectivity);
> static uint determine_search_depth(JOIN* join);
> +static void pick_table_access_method(JOIN_TAB *tab);
> C_MODE_START
> static int join_tab_cmp(const void *dummy, const void* ptr1, const void* ptr2);
> static int join_tab_cmp_straight(const void *dummy, const void* ptr1, const void* ptr2);
> @@ -10081,6 +10082,7 @@ static bool create_ref_for_key(JOIN *join, JOIN_TAB *j,
> j->ref.disable_cache= FALSE;
> j->ref.null_ref_part= NO_REF_PART;
> j->ref.const_ref_part_map= 0;
> + j->ref.not_null_keyparts= 0;
> keyuse=org_keyuse;
>
> store_key **ref_key= j->ref.key_copy;
> @@ -10173,23 +10175,12 @@ static bool create_ref_for_key(JOIN *join, JOIN_TAB *j,
> }
> } /* not ftkey */
> *ref_key=0; // end_marker
> + j->ref.not_null_keyparts= not_null_keyparts;
> if (j->type == JT_FT)
> DBUG_RETURN(0);
> - ulong key_flags= j->table->actual_key_flags(keyinfo);
> if (j->type == JT_CONST)
> j->table->const_table= 1;
> - else if (!((keyparts == keyinfo->user_defined_key_parts &&
> - (
> - (key_flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME ||
> - /* Unique key and all keyparts are NULL rejecting */
> - ((key_flags & HA_NOSAME) && keyparts == not_null_keyparts)
> - )) ||
> - /* true only for extended keys */
> - (keyparts > keyinfo->user_defined_key_parts &&
> - MY_TEST(key_flags & HA_EXT_NOSAME) &&
> - keyparts == keyinfo->ext_key_parts)
> - ) ||
> - null_ref_key)
> + else if (!j->is_eq_ref_access()|| null_ref_key)
> {
> /* Must read with repeat */
> j->type= null_ref_key ? JT_REF_OR_NULL : JT_REF;
> @@ -11582,11 +11573,25 @@ void set_join_cache_denial(JOIN_TAB *join_tab)
> don't do join buffering for the first table in sjm nest.
> */
> join_tab[-1].next_select= sub_select;
> - if (join_tab->type == JT_REF && join_tab->is_ref_for_hash_join())
> + if ((join_tab->type == JT_REF || join_tab->type == JT_HASH) &&
> + join_tab->is_ref_for_hash_join())
> {
> join_tab->type= JT_ALL;
> join_tab->ref.key_parts= 0;
Do we ever get here if join_tab->type == JT_REF? I think we don't.
If this is so, please remove it.
> }
> +
> + if (join_tab->type == JT_HASH && !join_tab->is_ref_for_hash_join())
> + {
> + join_tab->type= join_tab->is_eq_ref_access() ? JT_EQ_REF : JT_REF;
> + pick_table_access_method(join_tab);
> + }
Also please add a comment explaining that the join optimizer's choice was
ref access and we fall back to that.
> +
> + if (join_tab->type == JT_HASH_NEXT)
> + {
> + join_tab->type = JT_NEXT;
> + DBUG_ASSERT(join_tab->ref.key_parts == 0);
> + }
> +
> join_tab->join->return_tab= join_tab;
> }
> }
> @@ -27954,6 +27959,42 @@ void JOIN_TAB::partial_cleanup()
> }
>
>
> +/*
> + @brief
> + Check if the access method for the table is EQ_REF access or not
> +
> + @retval
> + TRUE EQ_REF access
> + FALSE Otherwise
> +*/
> +bool JOIN_TAB::is_eq_ref_access()
> +{
> +
> + KEY *keyinfo;
> + if (!is_hash_join_key_no(ref.key))
> + keyinfo= table->key_info + ref.key;
> + else
> + keyinfo= hj_key;
This code now looks as if it was possible that hash join is used and also
eq_ref is used.
Is this really possible? I've tried catching an example of this with MTR and I
wasn't able to do so.
Maybe we should just return false if hash join is used and also a comment about
this?
> +
> + uint keyparts= ref.key_parts;
> + ulong key_flags= table->actual_key_flags(keyinfo);
> + if ( (keyparts == keyinfo->user_defined_key_parts &&
> + (
> + (key_flags & (HA_NOSAME | HA_NULL_PART_KEY)) == HA_NOSAME ||
> + /* Unique key and all keyparts are NULL rejecting */
> + ((key_flags & HA_NOSAME) && keyparts == ref.not_null_keyparts)
> + )
> + ) ||
> + /* true only for extended keys */
> + (keyparts > keyinfo->user_defined_key_parts &&
> + MY_TEST(key_flags & HA_EXT_NOSAME) &&
> + keyparts == keyinfo->ext_key_parts)
> + )
> + return true;
> + return false;
> +}
> +
> +
> /**
> @} (end of group Query_Optimizer)
> */
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
1