developers
Threads by month
- ----- 2024 -----
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
February 2021
- 7 participants
- 17 discussions
Re: [Maria-developers] 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
by Sergei Golubchik 21 Jul '21
by Sergei Golubchik 21 Jul '21
21 Jul '21
Hi, Aleksey!
Don't be confused, the commit header and the comment come from your last
commit, but the diff below includes all three commits that mention MDEV-18734.
On Nov 02, Aleksey Midenkov wrote:
> revision-id: 2f9bbf392072 (mariadb-10.2.31-543-g2f9bbf392072)
> parent(s): 1f4960e3f2ac
> author: Aleksey Midenkov <midenok(a)gmail.com>
> committer: Aleksey Midenkov <midenok(a)gmail.com>
> timestamp: 2020-11-02 14:26:04 +0300
> message:
>
> MDEV-18734 optimization by taking ownership via String::swap()
>
> ha_partition stores records in array of m_ordered_rec_buffer and uses
> it for prio queue in ordered index scan. When the records are restored
> from the array the blob buffers may be already freed or
> rewritten. This can happen when blob buffers are cached in
> Field_blob::value or Field_blob::read_value.
>
> Previous solution worked via copying blob buffer into
> mem_root-allocated buffer. That was not optimal as it requires more
> memory and processing time.
>
> This patch avoids memory copy by taking temporary ownership of cached
> blob buffers via String::swap(). It also fixes problem when
> value.str_length is wrong after Field_blob::store().
> diff --git a/mysql-test/suite/federated/federated_partition.test b/mysql-test/suite/federated/federated_partition.test
> index ef1e27ec5054..33ee025442f4 100644
> --- a/mysql-test/suite/federated/federated_partition.test
> +++ b/mysql-test/suite/federated/federated_partition.test
> @@ -50,4 +50,30 @@ drop table federated.t1_2;
>
> --echo End of 5.1 tests
>
> +--echo #
> +--echo # MDEV-18734 ASAN heap-use-after-free upon sorting by blob column from partitioned table
> +--echo #
> +connection slave;
> +use federated;
> +create table t1_1 (x int, b text, key(x));
> +create table t1_2 (x int, b text, key(x));
> +connection master;
> +--replace_result $SLAVE_MYPORT SLAVE_PORT
> +eval create table t1 (x int, b text, key(x)) engine=federated
> + partition by range columns (x) (
> + partition p1 values less than (40) connection='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1_1',
> + partition pn values less than (maxvalue) connection='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1_2'
> +);
> +insert t1 values (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8);
> +insert t1 select x + 8, x + 8 from t1;
> +insert t1 select x + 16, x + 16 from t1;
> +insert t1 select x + 49, repeat(x + 49, 100) from t1;
> +--echo # This produces wrong result before MDEV-17573
> +select * from t1;
What do you mean, simple `select * from t1` produces incorrect result?
> +flush tables;
> +select x, b from t1 where x > 30 and x < 60 order by b;
Is this ASAN-only test?
> +drop table t1;
> +connection slave;
> +drop table t1_1, t1_2;
> +
> source include/federated_cleanup.inc;
> diff --git a/mysql-test/suite/vcol/t/partition.test b/mysql-test/suite/vcol/t/partition.test
> index 889724fb1c5c..f10ee0491ccc 100644
> --- a/mysql-test/suite/vcol/t/partition.test
> +++ b/mysql-test/suite/vcol/t/partition.test
> @@ -30,3 +30,28 @@ subpartition by hash(v) subpartitions 3 (
> insert t1 set i= 0;
> set statement sql_mode= '' for update t1 set i= 1, v= 2;
> drop table t1;
> +
> +--echo #
> +--echo # MDEV-18734 ASAN heap-use-after-free in my_strnxfrm_simple_internal upon update on versioned partitioned table
> +--echo #
> +--echo # Cover queue_fix() in ha_partition::handle_ordered_index_scan()
> +create table t1 (
> + x int auto_increment primary key,
> + b text, v mediumtext as (b) virtual,
> + index (v(10))
> +) partition by range columns (x) (
> + partition p1 values less than (3),
> + partition p2 values less than (6),
> + partition pn values less than (maxvalue));
> +insert into t1 (b) values ('q'), ('a'), ('b');
> +update t1 set b= 'bar' where v > 'a';
> +drop table t1;
This test didn't fail for me in an ASAN build
on the vanilla 10.5 without your fix.
That is, I'm not sure it actually tests the fix.
> +
> +--echo # Cover return_top_record() in ha_partition::handle_ordered_index_scan()
> +create table t1 (x int primary key, b tinytext, v text as (b) virtual)
> +partition by range columns (x) (
> + partition p1 values less than (4),
> + partition pn values less than (maxvalue));
> +insert into t1 (x, b) values (1, ''), (2, ''), (3, 'a'), (4, 'b');
> +update t1 set b= 'bar' where x > 0 order by v limit 2;
This fails without a fix all right.
Still, I wonder whether it's possible to create a test that'll fail
in a normal optimized build w/o ASAN
> +drop table t1;
> diff --git a/sql/field.cc b/sql/field.cc
> index bdaaecc20269..0fd40c979d2c 100644
> --- a/sql/field.cc
> +++ b/sql/field.cc
> @@ -8310,6 +8310,7 @@ int Field_blob::store(const char *from,uint length,CHARSET_INFO *cs)
> copy_length= copier.well_formed_copy(field_charset,
> (char*) value.ptr(), new_length,
> cs, from, length);
> + value.length(copy_length);
good! Could this be a distinct bug with its own test case?
> Field_blob::store_length(copy_length);
> bmove(ptr+packlength,(uchar*) &tmp,sizeof(char*));
>
> diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
> index 9d82df0235c7..7c658082d397 100644
> --- a/sql/ha_partition.cc
> +++ b/sql/ha_partition.cc
> @@ -5125,14 +5128,14 @@ bool ha_partition::init_record_priority_queue()
> uint alloc_len;
> uint used_parts= bitmap_bits_set(&m_part_info->read_partitions);
> /* Allocate record buffer for each used partition. */
> - m_priority_queue_rec_len= m_rec_length + PARTITION_BYTES_IN_POS;
> + m_priority_queue_rec_len= m_rec_length + ORDERED_REC_OFFSET;
> if (!m_using_extended_keys)
> m_priority_queue_rec_len += m_file[0]->ref_length;
> alloc_len= used_parts * m_priority_queue_rec_len;
> /* Allocate a key for temporary use when setting up the scan. */
> alloc_len+= table_share->max_key_length;
>
> - if (!(m_ordered_rec_buffer= (uchar*)my_malloc(alloc_len, MYF(MY_WME))))
> + if (!(m_ordered_rec_buffer= (uchar*) alloc_root(&m_ordered_root, alloc_len)))
I don't see why you need a memroot here. memroot is needed when you
allocate later at difeerent points in time some initially unpredictable
amount of memory that has the same lifetime and needs to be
freed all at once.
But here it seems that you only allocate once. You could just make
m_priority_queue_rec_len = m_rec_length + PARTITION_BYTES_IN_POS +
table->s->blob_fields * sizeof(Ordered_blob_storage)
(plus necessary alignment).
> DBUG_RETURN(true);
>
> /*
> @@ -6178,7 +6203,11 @@ int ha_partition::handle_ordered_index_scan(uchar *buf, bool reverse_order)
> */
> error= file->read_range_first(m_start_key.key? &m_start_key: NULL,
> end_range, eq_range, TRUE);
> - memcpy(rec_buf_ptr, table->record[0], m_rec_length);
> + if (!error)
> + {
> + memcpy(rec_buf_ptr, table->record[0], m_rec_length);
> + }
did you have a bug because of that? something didn't work?
> +
> reverse_order= FALSE;
> break;
> }
> @@ -6310,6 +6349,43 @@ int ha_partition::handle_ordered_index_scan_key_not_found()
> }
>
>
> +void ha_partition::swap_blobs(uchar * rec_buf, Ordered_blob_storage ** storage, bool restore)
> +{
> + uint *ptr, *end;
> + uint blob_n= 0;
> + table->move_fields(table->field, rec_buf, table->record[0]);
> + for (ptr= table->s->blob_field, end= ptr + table->s->blob_fields;
> + ptr != end; ++ptr, ++blob_n)
> + {
> + DBUG_ASSERT(*ptr < table->s->fields);
> + Field_blob *blob= (Field_blob*) table->field[*ptr];
> + DBUG_ASSERT(blob->flags & BLOB_FLAG);
> + DBUG_ASSERT(blob->field_index == *ptr);
> + if (!bitmap_is_set(table->read_set, *ptr) || blob->is_null())
> + continue;
> +
> + Ordered_blob_storage &s= *storage[blob_n];
> +
> + if (restore)
> + {
> + if (!s.blob.is_empty())
> + blob->swap(s.blob, s.set_read_value);
looks somewhat risky, why blob.length == 0 means that you don't
need to restore the value?
> + }
> + else
> + {
> + bool set_read_value;
> + String *cached= blob->cached(set_read_value);
> + if (cached)
> + {
> + cached->swap(s.blob);
> + s.set_read_value= set_read_value;
is it indeed possible here for a value to be either in `value` or in
`read_value` ?
When happens what?
> + }
> + }
> + }
> + table->move_fields(table->field, table->record[0], rec_buf);
> +}
> +
> +
> /*
> Common routine to handle index_next with ordered results
>
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
11
Re: [Maria-developers] 27060eb6ba5: MDEV-21916: COM_STMT_BULK_EXECUTE with RETURNING insert wrong values
by Sergei Golubchik 10 Jun '21
by Sergei Golubchik 10 Jun '21
10 Jun '21
Hi, Oleksandr!
On Oct 12, Oleksandr Byelkin wrote:
> revision-id: 27060eb6ba5 (mariadb-10.5.4-220-g27060eb6ba5)
> parent(s): 861cd4ce286
> author: Oleksandr Byelkin <sanja(a)mariadb.com>
> committer: Oleksandr Byelkin <sanja(a)mariadb.com>
> timestamp: 2020-10-07 15:22:41 +0200
> message:
>
> MDEV-21916: COM_STMT_BULK_EXECUTE with RETURNING insert wrong values
>
> To allocate new net buffer to avoid changing bufer we are reading.
You still didn't clarify the commit comment
> diff --git a/sql/sql_delete.cc b/sql/sql_delete.cc
> index 7280236e43f..5aaff3cf623 100644
> --- a/sql/sql_delete.cc
> +++ b/sql/sql_delete.cc
> @@ -685,8 +685,14 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds,
> !table->prepare_triggers_for_delete_stmt_or_event())
> will_batch= !table->file->start_bulk_delete();
>
> - if (returning)
> + /*
> + thd->get_stmt_da()->is_set() means first iteration of prepared statement
> + with array binding operation execution (non optimized so it is not
> + INSERT)
> + */
> + if (returning && !thd->get_stmt_da()->is_set())
> {
> + DBUG_ASSERT(thd->lex->sql_command != SQLCOM_INSERT);
a strange assert to see in sql_delete.cc :)
can one even reach mysql_delete() with SQLCOM_INSERT?
not just with returning and !thd->get_stmt_da()->is_set(), anywhere?
> if (result->send_result_set_metadata(returning->item_list,
> Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF))
> goto cleanup;
> diff --git a/sql/sql_error.cc b/sql/sql_error.cc
> index b3ef0d89a98..a753af2b34d 100644
> --- a/sql/sql_error.cc
> +++ b/sql/sql_error.cc
> @@ -380,16 +380,33 @@ Diagnostics_area::set_eof_status(THD *thd)
> if (unlikely(is_error() || is_disabled()))
> return;
>
> + if (m_status == DA_EOF_BULK)
> + {
> /*
> If inside a stored procedure, do not return the total
> number of warnings, since they are not available to the client
> anyway.
> */
> + if (!thd->spcont)
> + m_statement_warn_count+= current_statement_warn_count();
> + }
> + else
> + {
> + /*
> + If inside a stored procedure, do not return the total
> + number of warnings, since they are not available to the client
> + anyway.
> + */
I don't think it helps to duplicate the comment. You could just put it
once before the if.
> if (thd->spcont)
> {
> m_statement_warn_count= 0;
> + m_affected_rows= 0;
why do you reset m_affected_rows too?
> }
> else
> m_statement_warn_count= current_statement_warn_count();
> + m_status= (is_bulk_op() ? DA_EOF_BULK : DA_EOF);
> + }
do we have tests for bulk operations and CALL?
>
> - m_status= DA_EOF;
> DBUG_VOID_RETURN;
> }
>
> diff --git a/sql/sql_prepare.cc b/sql/sql_prepare.cc
> index ecb56e70f88..c144d3a8d7e 100644
> --- a/sql/sql_prepare.cc
> +++ b/sql/sql_prepare.cc
> @@ -4357,24 +4361,37 @@ Prepared_statement::execute_bulk_loop(String *expanded_query,
>
> + /*
> + Here second buffer for not optimized commands,
> + optimized commands do it inside thier internal loop.
> + */
> + if (!(sql_command_flags[lex->sql_command] & CF_SP_BULK_OPTIMIZED) &&
why "SP" and not "PS" ?
> + this->lex->has_returning())
what about CALL? It won't have lex->has_returning(). What about SELECT?
Can you bind an array to a parameter in SELECT or CALL?
> + {
> + // Above check can be true for SELECT in future
> + DBUG_ASSERT(lex->sql_command != SQLCOM_SELECT);
how can SQLCOM_SELECT have lex->has_returning() ?
> + readbuff= thd->net.buff; // old buffer
> + if (net_allocate_new_packet(&thd->net, thd, MYF(MY_THREAD_SPECIFIC)))
> + {
> + readbuff= NULL; // failure, net_allocate_new_packet keeps old buffer
> + goto err;
> + }
> }
>
> #ifndef EMBEDDED_LIBRARY
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
2
Re: [Maria-developers] ef2519fee4e: MDEV-16546 System versioning setting to allow history modification
by Aleksey Midenkov 08 Jun '21
by Aleksey Midenkov 08 Jun '21
08 Jun '21
Hello, Sergei!
On Fri, May 3, 2019 at 8:43 PM Sergei Golubchik <serg(a)mariadb.com> wrote:
>
> Hi, Aleksey!
>
> On May 03, Aleksey Midenkov wrote:
> > revision-id: ef2519fee4e (versioning-1.0.5-17-gef2519fee4e)
> > parent(s): 56145be2951
> > author: Aleksey Midenkov <midenok(a)gmail.com>
> > committer: Aleksey Midenkov <midenok(a)gmail.com>
> > timestamp: 2018-06-28 13:42:09 +0300
> > message:
> >
> > MDEV-16546 System versioning setting to allow history modification
> >
> > 1. Add server variable system_versioning_modify_history which will
> > allow to set values for row_start, row_end in DML operations.
> >
> > 2. If secure_timestamp is YES or REPLICATION,
> > system_versioning_modify_history does not have effect. If
> > secure_timestamp is SUPER, system_versioning_modify_history requires
> > special privilege (same as for setting current timestamp).
>
> I thought more about this idea. We don't really want to have the history
> editable, do we?
Well, I'm thinking about rollback table data to specific point in
time. That could be a useful feature.
> But it's needed for replication, to keep the master and
> slave identical. That's what secure_timestamp is for.
>
> The idea was that this new variable, system_versioning_modify_history,
> will be just a convenience feature, it will not allow history editing
> any more than one can do without it.
>
> But now I suspect that even with secure_timestamp=NO one cannot truly
> edit history. One can only insert new rows with arbitrary timestamps.
> For example, to insert a row with row_start=1000 and row_end=2000, one
> needs to do (if secure_timestamp=NO):
>
> set timestamp=1000;
> insert row;
> set timestamp=2000;
> delete row;
>
> But I don't see how one can update or delete a history row with
> secure_timestamp=NO.
>
> Now, with a SUPER privilege and secure_timestamp=NO or SUPER, one can
> use the BINLOG command and truly edit the history arbitrarily, by faking
> row events.
I don't really get it why this is so important: since there is some
limitation by configuration and privilege, we are just fine.
Everything can be changed at filesystem level after all.
>
> The conclusion, I believe, is that system_versioning_modify_history
> should allow INSERTs when secure_timestamp=NO, and it should allow
> UPDATE/DELETE only for a SUPER user when secure_timestamp=NO or SUPER.
I don't see a reason to argue on that. The only thing that is not
clear, why we don't allow INSERTs when secure_timestamp=SUPER?
>
> The second thing I don't like at all, is when a table is created like
>
> CREATE TABLE t1 (a int) WITH SYSTEM VERSIONING
>
> with row_start/row_end implicit. You don't have it in the test, but
> anyway one should be able to load history into such a table, while the
> table does not have row_start and row_end columns. From the user point
> of view these columns don't exist, they're pseudo-columns, like ROWID.
> They just cannot be insertable-into, conceptually. But a user will want
> to restore the history, right? I don't have a solution for this yet :(
> Any ideas?
We don't have to follow the conception if it doesn't help us. Since we
have physical row_start/row_end, we don't have to pretend they don't
exist. Who will win from that?
>
> See below a couple of minor comments about the patch itself.
>
> ...
These are going to be fixed.
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and security(a)mariadb.org
--
All the best,
Aleksey Midenkov
@midenok
2
15
31 May '21
Hi Alexey and everyone,
I was looking at whether it's possible use JSON_TABLE to extract portions of JSON
document. Apparently it is possible in MySQL with JSON datatype:
Q1:
select *
from
json_table('[{"color": "blue", "price": { "high": 10, "low": 5}},
{"color": "red", "price": { "high": 20, "low": 8}}]',
'$[*]' columns(color varchar(100) path '$.color',
price json path '$.price'
)
) as T;
+-------+------------------------+
| color | price |
+-------+------------------------+
| blue | {"low": 5, "high": 10} |
| red | {"low": 8, "high": 20} |
+-------+------------------------+
Note that if one uses any datatype other than JSON, they get NULLs:
Q2:
select *
from
json_table('[{"color": "blue", "price": { "high": 10, "low": 5}},
{"color": "red", "price": { "high": 20, "low": 8}}]',
'$[*]' columns(color varchar(100) path '$.color',
price text path '$.price'
)
) as T;
+-------+-------+
| color | price |
+-------+-------+
| blue | NULL |
| red | NULL |
+-------+-------+
Oracle-the-database doesn't yet(*) have a JSON datatype. So I can only run Q2
and then I get NULLs in the price column.
MariaDB accepts JSON as datatype so query Q1 is accepted.
However the logic in MDEV-17399 code doesn't have support for dumping a portion
of JSON document, so one gets empty strings in the price column.
Should we support Q1 with JSON output in the price column? If yes, should we
do it within the scope of MDEV-17399 or create another task for this?
(*) I see this:
https://docs.oracle.com/en/database/oracle/oracle-database/20/newft/new-jso…
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
3
2
Re: [Maria-developers] 21eb8969ce9: Improved storage size for Item, Field and some other classes
by Sergei Golubchik 14 May '21
by Sergei Golubchik 14 May '21
14 May '21
Hi, Monty!
Looks ok, but again it doesn't seem you've squashed intermediate commits
as you said you did.
Bit fields and non-existent commits in columnstore - it's is clearly an
intermediate work-in-progress state, all fixed in your later commits.
On Sep 08, Michael Widenius wrote:
> revision-id: 21eb8969ce9 (mariadb-10.5.2-270-g21eb8969ce9)
> parent(s): c3ecf0d6243
> author: Michael Widenius <michael.widenius(a)gmail.com>
> committer: Michael Widenius <michael.widenius(a)gmail.com>
> timestamp: 2020-09-02 20:58:34 +0300
> message:
>
> Improved storage size for Item, Field and some other classes
>
> - Changed order of class fields to remove dead alignment space.
> - Changed bool fields in Item to bit fields.
> - Used packed enum's for some fields in common classes
> - Removed not used Item::rsize.
> - Changed some class variables from uint/int to smaller type int's.
> - Ensured that field_index is uint16 in all classes and functions. Fixed
> also that we proparly compare with NO_CACHED_FIELD_INDEX when checking
> if variable is not set.
> - Removed checking of highest bit of unireg_check (has not been used in
> a long time)
> - Fixed wrong arguments to make_cond_for_table() for join_tab_idx_arg
> from false to 0.
>
> One of the result was reducing the size if class Item with ~24 bytes
...
> @@ -929,18 +904,48 @@ class Item: public Value_source,
> +
> + /*
> + str_values's main purpose is to be used to cache the value in
> + save_in_field
> + */
> + String str_value;
> +
> + LEX_CSTRING name; /* Name of item */
> + /* Original item name (if it was renamed)*/
> + const char *orig_name;
> +
> + uint32 /* All common bool variables for Item stored here */
> + maybe_null:1, /* If item may be null */
> + in_rollup:1, /* If used in GROUP BY list of a query with ROLLUP */
> + with_param:1, /* True if Item contains an SP parameter */
> + with_window_func:1, /* True if item contains a window func */
> + with_field:1, /* True if any item except Item_sum contains a field.
> + Set during parsing. */
> + fixed:1; /* If item was fixed with fix_fields */
> +
> + int16 marker;
> +
> diff --git a/storage/columnstore/columnstore b/storage/columnstore/columnstore
> index b6b02ed516f..f606e76fb77 160000
> --- a/storage/columnstore/columnstore
> +++ b/storage/columnstore/columnstore
> @@ -1 +1 @@
> -Subproject commit b6b02ed516f92055127d416370799d91a82754ea
> +Subproject commit f606e76fb779e40f3376693fff9969e4f2b7669a
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
3
Re: [Maria-developers] 4088269c578d: Improved storage size for Item, Field and some other classes
by Sergei Golubchik 14 May '21
by Sergei Golubchik 14 May '21
14 May '21
Hi, Michael!
On Dec 03, Michael Widenius wrote:
> revision-id: 4088269c578d (mariadb-10.5.2-270-g4088269c578d)
> parent(s): 994ea2af3973
> author: Michael Widenius <michael.widenius(a)gmail.com>
> committer: Michael Widenius <michael.widenius(a)gmail.com>
> timestamp: 2020-09-17 12:25:44 +0300
> message:
>
> Improved storage size for Item, Field and some other classes
>
> diff --git a/storage/columnstore/columnstore b/storage/columnstore/columnstore
> index b6b02ed516f9..f606e76fb779 160000
> --- a/storage/columnstore/columnstore
> +++ b/storage/columnstore/columnstore
> @@ -1 +1 @@
> -Subproject commit b6b02ed516f92055127d416370799d91a82754ea
> +Subproject commit f606e76fb779e40f3376693fff9969e4f2b7669a
Again. You should not push invalid commit hashes into the main branch:
storage/columnstore/columnstore $ git fetch origin
storage/columnstore/columnstore $ git show f606e76fb779e40f3376693fff9969e4f2b7669a
fatal: bad object f606e76fb779e40f3376693fff9969e4f2b7669a
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
1
Re: [Maria-developers] 9bf4b92cbc5: MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
by Sergei Golubchik 31 Mar '21
by Sergei Golubchik 31 Mar '21
31 Mar '21
Hi, Aleksey!
On Jun 29, Aleksey Midenkov wrote:
> revision-id: 9bf4b92cbc5 (mariadb-10.5.2-168-g9bf4b92cbc5)
> parent(s): 478301d9b9a
> author: Aleksey Midenkov <midenok(a)gmail.com>
> committer: Aleksey Midenkov <midenok(a)gmail.com>
> timestamp: 2020-04-17 17:04:24 +0300
> message:
>
> MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
>
> == Syntax change ==
>
> Keyword AUTO_INCREMENT (or AUTO) enables partition auto-creation.
I still think that AUTO_INCREMENT here looks rather out-of-place,
I don't know how to explain to users why AUTO_INCREMENT is allowed here.
But if we'll have it, we'll have to support it here for a long time
for compatibility reasons. "Because somebody might be already using it"
I'd suggest to support only AUTO here.
Or, if the parser can do it easily, a much more readable syntax would be
create table t1 (x int) with system versioning
auto partition by system_time interval 1 hour;
this would be the best, easy to read, matches the documentation (where
it could be called "auto partitioning"). If it won't cause any difficult
parser problems, I'd prefer us to use that.
> create or replace table t1 (x int) with system versioning
> partition by system_time interval 1 hour auto_increment;
>
> create or replace table t1 (x int) with system versioning
> partition by system_time limit 1000 auto;
>
> Or with explicit partitions:
>
> create or replace table t1 (x int) with system versioning
> partition by system_time interval 1 hour auto
> (partition p0 history, partition pn current);
>
> == Description ==
>
> Before executing history-generating DML command add N history
> partitions, so that N would be sufficient for potentially generated
> history. N > 1 may be required when history is rotated by INTERVAL and
> timestamp was jumped to future further than interval value.
>
> It is assumed that one DML command will not generate history rows more
> than specified LIMIT. Otherwise history would overflow generated
> partition, a warning would be printed, and user action would be
> required to rebuild partitions.
>
> Auto-creation is implemented by synchronous
> fast_alter_partition_table() call from the thread of the executed DML
> command before the command itself (by the fallback and retry mechanism
> similar to Discovery feature, see Open_table_context).
>
> Creating history partitions was made by the principle of minimal
> disruption of the main business process. I.e. when procedure of
> creation fails it will not (if possible) fail the original DML
> command. Warning info will display what happened and the last history
> partition may overflow. In such case partition rebuild is required to
> correctly display history info. User application may detect warning
> code ER_VERS_HIST_PART_FAILED and stop execution if that is preferred.
>
> The name for newly added partitions are generated like default
> partition names with extension of MDEV-22155 (which avoids name
> clashes by extending assignment counter to next free-enough gap).
>
> These DML commands trigger auto-creation:
>
> * DELETE (including multi-delete, excluding DELETE HISTORY)
> * UPDATE (including multi-update)
> * REPLACE (including REPLACE .. SELECT)
INSERT ... ON DUPLICATE KEY UPDATE ?
> diff --git a/mysql-test/suite/versioning/common.inc b/mysql-test/suite/versioning/common.inc
> index 355b571e5a0..b35a5138015 100644
> --- a/mysql-test/suite/versioning/common.inc
> +++ b/mysql-test/suite/versioning/common.inc
> @@ -6,6 +6,7 @@ if (!$TEST_VERSIONING_SO)
> source include/have_innodb.inc;
>
> set @@session.time_zone='+00:00';
> +set @@global.time_zone='+00:00';
Why is that? I'd expect your auto-adding of partitions to happen in the
context of a session, using session time zone.
Is it for replication?
> select ifnull(max(transaction_id), 0) into @start_trx_id from mysql.transaction_registry;
> set @test_start=now(6);
>
> diff --git a/mysql-test/suite/versioning/r/delete_history.result b/mysql-test/suite/versioning/r/delete_history.result
> index cb865a835b3..90c9e4777bb 100644
> --- a/mysql-test/suite/versioning/r/delete_history.result
> +++ b/mysql-test/suite/versioning/r/delete_history.result
> @@ -63,8 +63,6 @@ insert into t values (1);
> update t set a= 2;
> update t set a= 3;
> delete history from t;
> -Warnings:
> -Warning 4114 Versioned table `test`.`t`: last HISTORY partition (`p1`) is out of LIMIT, need more HISTORY partitions
good riddance, it didn't make much sense anyway
> # The above warning is one command late (MDEV-20345) ^^^
> select * from t for system_time all;
> a
> diff --git a/sql/ha_partition.h b/sql/ha_partition.h
> index 85cb736b5bd..5fb893b4f0e 100644
> --- a/sql/ha_partition.h
> +++ b/sql/ha_partition.h
> @@ -1623,7 +1623,7 @@ class ha_partition :public handler
could you also, please, fix (in a separate commit)
ha_partition::part_records() to take a proper
partition_element* argument, not void* ?
> for (; part_id < part_id_end; ++part_id)
> {
> handler *file= m_file[part_id];
> - DBUG_ASSERT(bitmap_is_set(&(m_part_info->read_partitions), part_id));
> + bitmap_set_bit(&(m_part_info->read_partitions), part_id);
> file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK | HA_STATUS_OPEN);
> part_recs+= file->stats.records;
> }
> diff --git a/sql/partition_info.cc b/sql/partition_info.cc
> index e9dbf2b49c3..72312ee7ac4 100644
> --- a/sql/partition_info.cc
> +++ b/sql/partition_info.cc
> @@ -814,10 +814,16 @@ bool partition_info::has_unique_name(partition_element *element)
> vers_info->interval Limit by fixed time interval
> vers_info->hist_part (out) Working history partition
> */
> -void partition_info::vers_set_hist_part(THD *thd)
> +uint partition_info::vers_set_hist_part(THD *thd, bool auto_inc)
let's not all it auto_inc. it'll add noise to every grep for auto_inc
issues. What about auto_add ?
> {
> + DBUG_ASSERT(!thd->lex->last_table()->vers_conditions.delete_history);
> +
> + uint create_count= 0;
> + auto_inc= auto_inc && vers_info->auto_inc;
> +
> if (vers_info->limit)
> {
> + DBUG_ASSERT(!vers_info->interval.is_set());
> ha_partition *hp= (ha_partition*)(table->file);
> partition_element *next= NULL;
> List_iterator<partition_element> it(partitions);
> @@ -836,22 +842,26 @@ void partition_info::vers_set_hist_part(THD *thd)
> {
> if (next == vers_info->now_part)
> {
> - my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
> - table->s->db.str, table->s->table_name.str,
> - vers_info->hist_part->partition_name, "LIMIT");
> + if (auto_inc)
> + create_count= 1;
> + else
> + my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
> + table->s->db.str, table->s->table_name.str,
> + vers_info->hist_part->partition_name, "LIMIT");
> }
> else
> vers_info->hist_part= next;
> }
> - return;
> + // reserve at least one history partition
> + if (auto_inc && create_count == 0 &&
> + vers_info->hist_part->id + 1 == vers_info->now_part->id)
> + create_count= 1;
Questionable. What does it solve?
> }
> -
> - if (vers_info->interval.is_set())
> + else if (vers_info->interval.is_set() &&
> + vers_info->hist_part->range_value <= thd->query_start())
> {
> - if (vers_info->hist_part->range_value > thd->query_start())
> - return;
> -
> partition_element *next= NULL;
> + bool error= true;
> List_iterator<partition_element> it(partitions);
> while (next != vers_info->hist_part)
> next= it++;
> @@ -860,12 +870,200 @@ void partition_info::vers_set_hist_part(THD *thd)
> {
> vers_info->hist_part= next;
> if (next->range_value > thd->query_start())
> - return;
> + {
> + error= false;
> + break;
but here you don't "reserve at least one history partition"
Why INTERVAL is different?
> + }
> + }
> + if (error)
> + {
> + if (auto_inc)
> + {
> + DBUG_ASSERT(thd->query_start() >= vers_info->hist_part->range_value);
> + my_time_t diff= thd->query_start() - vers_info->hist_part->range_value;
> + if (diff > 0)
> + {
> + size_t delta= vers_info->interval.seconds();
> + create_count= diff / delta + 1;
> + if (diff % delta)
> + create_count++;
> + }
> + else
> + create_count= 1;
> + }
> + else
> + {
> + my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
> + table->s->db.str, table->s->table_name.str,
> + vers_info->hist_part->partition_name, "INTERVAL");
> + }
> }
> - my_error(WARN_VERS_PART_FULL, MYF(ME_WARNING|ME_ERROR_LOG),
> - table->s->db.str, table->s->table_name.str,
> - vers_info->hist_part->partition_name, "INTERVAL");
> }
> +
> + return create_count;
> +}
> +
> +
> +/**
> + @brief Run fast_alter_partition_table() to add new history partitions
> + for tables requiring them.
> +*/
> +void vers_add_auto_parts(THD *thd, TABLE_LIST* tl, uint num_parts)
> +{
> + HA_CREATE_INFO create_info;
> + Alter_info alter_info;
> + String query;
> + partition_info *save_part_info= thd->work_part_info;
> + Query_tables_list save_query_tables;
> + Reprepare_observer *save_reprepare_observer= thd->m_reprepare_observer;
> + Diagnostics_area new_stmt_da(thd->query_id, false, true);
> + Diagnostics_area *save_stmt_da= thd->get_stmt_da();
> + bool save_no_write_to_binlog= thd->lex->no_write_to_binlog;
> + const CSET_STRING save_query= thd->query_string;
> + thd->m_reprepare_observer= NULL;
> + thd->lex->reset_n_backup_query_tables_list(&save_query_tables);
> + thd->in_sub_stmt|= SUB_STMT_AUTO_HIST;
> + thd->lex->no_write_to_binlog= !thd->is_current_stmt_binlog_format_row();
> + TABLE *table= tl->table;
> +
> + DBUG_ASSERT(!thd->is_error());
> + /* NB: we have to preserve m_affected_rows, m_row_count_func, m_last_insert_id, etc */
> + thd->set_stmt_da(&new_stmt_da);
> + new_stmt_da.set_overwrite_status(true);
Is it needed?
You've just started using a new Diagnostics_area, the status should be
DA_EMPTY here, nothing to overwrite.
> +
> + {
> + DBUG_ASSERT(table->s->get_table_ref_type() == TABLE_REF_BASE_TABLE);
> + DBUG_ASSERT(table->versioned());
> + DBUG_ASSERT(table->part_info);
> + DBUG_ASSERT(table->part_info->vers_info);
> + alter_info.reset();
> + alter_info.partition_flags= ALTER_PARTITION_ADD|ALTER_PARTITION_AUTO_HIST;
> + create_info.init();
> + create_info.alter_info= &alter_info;
> + Alter_table_ctx alter_ctx(thd, tl, 1, &table->s->db, &table->s->table_name);
> +
> + MDL_REQUEST_INIT(&tl->mdl_request, MDL_key::TABLE, tl->db.str,
> + tl->table_name.str, MDL_SHARED_NO_WRITE, MDL_TRANSACTION);
> + if (thd->mdl_context.acquire_lock(&tl->mdl_request,
> + thd->variables.lock_wait_timeout))
> + goto exit;
> + table->mdl_ticket= tl->mdl_request.ticket;
> +
> + create_info.db_type= table->s->db_type();
> + create_info.options|= HA_VERSIONED_TABLE;
> + DBUG_ASSERT(create_info.db_type);
> +
> + create_info.vers_info.set_start(table->s->vers_start_field()->field_name);
> + create_info.vers_info.set_end(table->s->vers_end_field()->field_name);
> +
> + partition_info *part_info= new partition_info();
> + if (unlikely(!part_info))
> + {
> + my_error(ER_OUT_OF_RESOURCES, MYF(0));
> + goto exit;
> + }
> + part_info->use_default_num_partitions= false;
> + part_info->use_default_num_subpartitions= false;
> + part_info->num_parts= num_parts;
> + part_info->num_subparts= table->part_info->num_subparts;
> + part_info->subpart_type= table->part_info->subpart_type;
> + if (unlikely(part_info->vers_init_info(thd)))
> + {
> + my_error(ER_OUT_OF_RESOURCES, MYF(0));
> + goto exit;
> + }
> +
> + // NB: set_ok_status() requires DA_EMPTY
> + thd->get_stmt_da()->reset_diagnostics_area();
Is it needed?
You've just started using a new Diagnostics_area, the status should be
DA_EMPTY here.
> +
> + thd->work_part_info= part_info;
> + if (part_info->set_up_defaults_for_partitioning(thd, table->file, NULL,
> + table->part_info->next_part_no(num_parts)))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN,
> + ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "setting up defaults failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + bool partition_changed= false;
> + bool fast_alter_partition= false;
> + if (prep_alter_part_table(thd, table, &alter_info, &create_info,
> + &partition_changed, &fast_alter_partition))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter partitition prepare failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + if (!fast_alter_partition)
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "fast alter partitition is not possible");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> + DBUG_ASSERT(partition_changed);
> + if (mysql_prepare_alter_table(thd, table, &create_info, &alter_info,
> + &alter_ctx))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter prepare failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + goto exit;
> + }
> +
> + // Forge query string for rpl logging
> + if (!thd->lex->no_write_to_binlog)
> + {
> + query.set(STRING_WITH_LEN("ALTER TABLE `"), &my_charset_latin1);
better use StringBuffer<...> query;
because you're doing lots of reallocs now.
> +
> + if (query.append(table->s->db) ||
> + query.append(STRING_WITH_LEN("`.`")) ||
> + query.append(table->s->table_name) ||
this is wrong, the table and db names can have backticks inside.
use append_identifier() instead.
> + query.append("` ADD PARTITION PARTITIONS ") ||
> + query.append_ulonglong(part_info->num_parts) ||
> + query.append(" AUTO"))
> + {
> + my_error(ER_OUT_OF_RESOURCES, MYF(ME_ERROR_LOG));
> + goto exit;
> + }
> + CSET_STRING qs(query.c_ptr(), query.length(), &my_charset_latin1);
> + thd->set_query(qs);
> + }
Should it be binlogged at all? May be just leave it to the slave to
auto-add the partition as needed?
Looks a bit suspicious now, you log an ALTER TABLE possibly in the
middle of a transaction. It will be replayed differently, not as
auto-adding, in particular, it will commit. So, possibly different
results on the slave, perhaps different gtid.
Do clear it out with Andrei please. Or don't binlog auto-adding at all.
> +
> + if (fast_alter_partition_table(thd, table, &alter_info, &create_info,
> + tl, &table->s->db, &table->s->table_name))
> + {
> + push_warning(thd, Sql_condition::WARN_LEVEL_WARN, ER_VERS_HIST_PART_FAILED,
> + "Auto-increment history partition: "
> + "alter partition table failed");
> + my_error(ER_VERS_HIST_PART_FAILED, MYF(ME_WARNING),
> + tl->db.str, tl->table_name.str);
> + }
> + }
> +
> + if (!thd->transaction.stmt.is_empty())
> + trans_commit_stmt(thd);
1. why would this operation register a transaction?
2. you amended a couple of checks like
- if (thd->in_sub_stmt)
+ if (thd->in_sub_stmt & ~SUB_STMT_AUTO_HIST)
but what will happen if it's, for example, an insert inside a trigger?
So a real sub_stmt and SUB_STMT_AUTO_HIST?
> +
> +exit:
> + thd->work_part_info= save_part_info;
> + thd->m_reprepare_observer= save_reprepare_observer;
> + thd->lex->restore_backup_query_tables_list(&save_query_tables);
> + thd->in_sub_stmt&= ~SUB_STMT_AUTO_HIST;
> + if (!new_stmt_da.is_warning_info_empty())
> + save_stmt_da->copy_sql_conditions_from_wi(thd, new_stmt_da.get_warning_info());
> + thd->set_stmt_da(save_stmt_da);
> + thd->lex->no_write_to_binlog= save_no_write_to_binlog;
> + thd->set_query(save_query);
> }
>
>
> diff --git a/sql/partition_info.h b/sql/partition_info.h
> index 7ae2d168068..ca68e61cbe2 100644
> --- a/sql/partition_info.h
> +++ b/sql/partition_info.h
> @@ -72,9 +73,34 @@ struct Vers_part_info : public Sql_alloc
> my_time_t start;
> INTERVAL step;
> enum interval_type type;
> - bool is_set() { return type < INTERVAL_LAST; }
> + bool is_set() const { return type < INTERVAL_LAST; }
> + size_t seconds() const
> + {
> + if (step.second)
> + return step.second;
> + if (step.minute)
> + return step.minute * 60;
> + if (step.hour)
> + return step.hour * 3600;
> + if (step.day)
> + return step.day * 3600 * 24;
> + // comparison is used in rough estimates, it doesn't need to be calendar-correct
Are you sure the approximate value is enough here? You use it to
estimate how many partitions to create. It's not a big deal if you
create more than necessary (although it's not nice and we'll definitely
get bug reports if that will happen). But you surely don't want to
create less.
> + if (step.month)
> + return step.month * 3600 * 24 * 30;
> + DBUG_ASSERT(step.year);
> + return step.year * 86400 * 30 * 365;
> + }
> + bool lt(size_t secs) const
> + {
> + return seconds() < secs;
> + }
> + bool ge(size_t seconds) const
> + {
> + return !(this->lt(seconds));
> + }
lt() and ge() don't seem to be used anywhere.
> } interval;
> ulonglong limit;
> + bool auto_inc;
> partition_element *now_part;
> partition_element *hist_part;
> };
> diff --git a/sql/sql_base.cc b/sql/sql_base.cc
> index b89be77f282..5583d70f8eb 100644
> --- a/sql/sql_base.cc
> +++ b/sql/sql_base.cc
> @@ -1862,6 +1862,25 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)
> DBUG_PRINT("info",("Using locked table"));
> #ifdef WITH_PARTITION_STORAGE_ENGINE
> part_names_error= set_partitions_as_used(table_list, table);
> + if (table->part_info &&
> + table->part_info->part_type == VERSIONING_PARTITION &&
> + !table_list->vers_conditions.delete_history &&
> + table_list->lock_type >= TL_WRITE_ALLOW_WRITE &&
> + table_list->mdl_request.type == MDL_SHARED_WRITE)
> + {
> + switch (thd->lex->sql_command)
> + {
> + case SQLCOM_DELETE:
> + case SQLCOM_UPDATE:
> + case SQLCOM_REPLACE:
> + case SQLCOM_REPLACE_SELECT:
> + case SQLCOM_DELETE_MULTI:
> + case SQLCOM_UPDATE_MULTI:
> + /* Rotation is still needed under LOCK TABLES */
> + table->part_info->vers_set_hist_part(thd, false);
ALTER TABLE works under LOCK TABLES, so this should be possible too,
but let's have it as a separate task.
> + default:;
> + }
> + }
> #endif
> goto reset;
> }
> @@ -2104,6 +2123,37 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)
> tc_add_table(thd, table);
> }
>
> +#ifdef WITH_PARTITION_STORAGE_ENGINE
> + if (table->part_info &&
> + table->part_info->part_type == VERSIONING_PARTITION &&
> + !table_list->vers_conditions.delete_history &&
> + table_list->lock_type >= TL_WRITE_ALLOW_WRITE &&
> + table_list->mdl_request.type >= MDL_SHARED_WRITE &&
> + table_list->mdl_request.type < MDL_EXCLUSIVE)
> + {
> + switch (thd->lex->sql_command)
> + {
> + case SQLCOM_LOCK_TABLES:
> + case SQLCOM_DELETE:
> + case SQLCOM_UPDATE:
> + case SQLCOM_REPLACE:
> + case SQLCOM_REPLACE_SELECT:
> + case SQLCOM_DELETE_MULTI:
> + case SQLCOM_UPDATE_MULTI:
this is quite complex condition, better not to duplicate it. May be
something like
bool need_set_hist_part(TABLE_LIST *table_list, enum_sql_command sql_command)
{
if (table_list->table->part_info &&
table_list->table->part_info->part_type == VERSIONING_PARTITION &&
!table_list->vers_conditions.delete_history &&
table_list->lock_type >= TL_WRITE_ALLOW_WRITE &&
table_list->mdl_request.type >= MDL_SHARED_WRITE &&
table_list->mdl_request.type < MDL_EXCLUSIVE)
{
switch(sql_command) {
case SQLCOM_LOCK_TABLES:
case SQLCOM_DELETE:
case SQLCOM_UPDATE:
case SQLCOM_REPLACE:
case SQLCOM_REPLACE_SELECT:
case SQLCOM_DELETE_MULTI:
case SQLCOM_UPDATE_MULTI:
return true;
}
default:;
}
return false;
}
> + ot_ctx->vers_create_count= table->part_info->vers_set_hist_part(thd, true);
> + if (ot_ctx->vers_create_count)
> + {
> + ot_ctx->request_backoff_action(Open_table_context::OT_ADD_HISTORY_PARTITION,
> + table_list);
> + MYSQL_UNBIND_TABLE(table->file);
> + tc_release_table(table);
There's no MYSQL_UNBIND_TABLE/tc_release_table after other
request_backoff_action invocations. Why this one is special?
> + DBUG_RETURN(TRUE);
> + }
> + default:;
> + }
> + }
> +#endif
> +
> if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) &&
> table->s->table_category < TABLE_CATEGORY_INFORMATION)
> {
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index 14516811262..145ac5c5f64 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -3346,6 +3347,10 @@ class THD: public THD_count, /* this must be first */
>
> #ifdef WITH_PARTITION_STORAGE_ENGINE
> partition_info *work_part_info;
> + /**
> + List of tables requiring new history partition.
> + */
> + List<TABLE_SHARE> vers_auto_part_tables;
this doesn't seem to be used anywhere
> #endif
>
> #ifndef EMBEDDED_LIBRARY
> diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
> index 7cc1faea79b..3f6c1793432 100644
> --- a/sql/sql_yacc.yy
> +++ b/sql/sql_yacc.yy
> @@ -7521,14 +7529,17 @@ add_partition_rule:
>
> add_part_extra:
> /* empty */
> - | '(' part_def_list ')'
> + | '(' part_def_list ')' opt_vers_auto_inc
> {
> - LEX *lex= Lex;
> - lex->part_info->num_parts= lex->part_info->partitions.elements;
> + Lex->part_info->num_parts= Lex->part_info->partitions.elements;
> + if ($4)
> + Lex->alter_info.partition_flags|= ALTER_PARTITION_AUTO_HIST;
> }
> - | PARTITIONS_SYM real_ulong_num
> + | PARTITIONS_SYM real_ulong_num opt_vers_auto_inc
> {
> Lex->part_info->num_parts= $2;
> + if ($3)
> + Lex->alter_info.partition_flags|= ALTER_PARTITION_AUTO_HIST;
I'm confused. I thought that ALTER_PARTITION_AUTO_HIST is what you set
in vers_add_auto_parts() to mark auto-adding of a new partition.
But here you set it in the parser, when a user specifies AUTO in
partition specifications. So it seems you're using this flag for two
very different purposes. How do you distinguish between them?
And why did you decide to do it this way?
> }
> ;
>
> diff --git a/sql/sql_partition.cc b/sql/sql_partition.cc
> index fbed614489d..25017ee5425 100644
> --- a/sql/sql_partition.cc
> +++ b/sql/sql_partition.cc
> @@ -4825,6 +4829,13 @@ uint prep_alter_part_table(THD *thd, TABLE *table, Alter_info *alter_info,
> DBUG_RETURN(TRUE);
> }
>
> + if (alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST &&
> + (!table->part_info || !table->part_info->vers_info))
> + {
> + my_error(ER_SYNTAX_ERROR, MYF(0));
can it even happen? Or should it be DBUG_ASSERT(0) here?
> + DBUG_RETURN(TRUE);
> + }
> +
> partition_info *alt_part_info= thd->lex->part_info;
> /*
> This variable is TRUE in very special case when we add only DEFAULT
> @@ -5312,7 +5323,9 @@ that are reorganised.
> now_part= el;
> }
> }
> - if (*fast_alter_table && tab_part_info->vers_info->interval.is_set())
> + if (*fast_alter_table &&
> + !(alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST) &&
> + tab_part_info->vers_info->interval.is_set())
this !ALTER_PARTITION_AUTO_HIST - do you mean not vers_add_auto_parts() or
not sql statement that uses AUTO ?
> {
> partition_element *hist_part= tab_part_info->vers_info->hist_part;
> if (hist_part->range_value <= thd->query_start())
> @@ -5347,7 +5360,8 @@ that are reorganised.
> */
> if (!(alter_info->partition_flags & ALTER_PARTITION_TABLE_REORG))
> {
> - if (!alt_part_info->use_default_partitions)
> + if (!alt_part_info->use_default_partitions &&
> + !(alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST))
Sorry, I don't understand that if() at all. What does it do?
> {
> DBUG_PRINT("info", ("part_info: %p", tab_part_info));
> tab_part_info->use_default_partitions= FALSE;
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
2
4
01 Mar '21
Hi Alexey,
I was looking at Json_table_nested_path::set_position(), wondering why does
it have an assignment
np->m_null= TRUE;
but doesn't clear the NULL values and trying to come up with an example of this
going wrong when I've hit this crash:
select * from
json_table(
'[
{"name": "X",
"colors":["blue"], "sizes": [1,2,3,4], "prices" : [10,20]},
{"name": "Y",
"colors":["red"], "sizes": [10,11], "prices" : [100,200,300]}
]',
'$[*]' columns
(
seq0 for ordinality,
name varchar(4) path '$.name',
nested path '$.colors[*]' columns (
seq1 for ordinality,
color text path '$'
),
nested path '$.sizes[*]' columns (
seq2 for ordinality,
size int path '$'
),
nested path '$.prices[*]' columns (
seq3 for ordinality,
price int path '$'
)
)
) as T order by seq0, name;
Note this==NULL:
(gdb) wher
#0 0x00005555560edf72 in Json_table_nested_path::set_position (this=0x0, j_start=0x7ffeb0016e68 "[ \n {\"name\": \"X\", \n \"colors\":[\"blue\"], \"sizes\": [1,2,3,4], \"prices\" : [10,20]},\n {\"name\": \"Y\", \n \"colors\":[\"red\"], \"sizes\": [10,11], \"prices\" : [100,200,300]}\n]", j_end=0x7ffeb0016f12 "", pos=0x7ffeb0035e51 "\245\245\245\245\245\245\245\245\006") at /home/psergey/dev-git2/10.6-hf-review6/sql/table_function.cc:239
#1 0x00005555560ee12f in Json_table_nested_path::set_position (this=0x7ffeb0017060, j_start=0x7ffeb0016e68 "[ \n {\"name\": \"X\", \n \"colors\":[\"blue\"], \"sizes\": [1,2,3,4], \"prices\" : [10,20]},\n {\"name\": \"Y\", \n \"colors\":[\"red\"], \"sizes\": [10,11], \"prices\" : [100,200,300]}\n]", j_end=0x7ffeb0016f12 "", pos=0x7ffeb0035e48 "") at /home/psergey/dev-git2/10.6-hf-review6/sql/table_function.cc:262
#2 0x00005555560ee9f0 in ha_json_table::rnd_pos (this=0x7ffeb0014f00, buf=0x7ffeb0025570 "\377", pos=0x7ffeb0035e48 "") at /home/psergey/dev-git2/10.6-hf-review6/sql/table_function.cc:434
#3 0x00005555561ca6a4 in handler::ha_rnd_pos (this=0x7ffeb0014f00, buf=0x7ffeb0025570 "\377", pos=0x7ffeb0035e48 "") at /home/psergey/dev-git2/10.6-hf-review6/sql/handler.cc:3101
#4 0x00005555563852e3 in rr_from_pointers (info=0x7ffeb001f9e0) at /home/psergey/dev-git2/10.6-hf-review6/sql/records.cc:615
#5 0x0000555555da4a75 in READ_RECORD::read_record (this=0x7ffeb001f9e0) at /home/psergey/dev-git2/10.6-hf-review6/sql/records.h:81
#6 0x0000555555ee1876 in join_init_read_record (tab=0x7ffeb001f918) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:21644
#7 0x0000555555edf35a in sub_select (join=0x7ffeb001d948, join_tab=0x7ffeb001f918, end_of_records=false) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:20666
#8 0x0000555555ede8e6 in do_select (join=0x7ffeb001d948, procedure=0x0) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:20216
#9 0x0000555555eb24e7 in JOIN::exec_inner (this=0x7ffeb001d948) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:4484
#10 0x0000555555eb1613 in JOIN::exec (this=0x7ffeb001d948) at /home/psergey/dev-git2/10.6-hf-review6/sql/sql_select.cc:4264
Please fix.
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
2
2
Re: [Maria-developers] ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
by Sergei Golubchik 26 Feb '21
by Sergei Golubchik 26 Feb '21
26 Feb '21
Hi, Sujatha!
The main comment - the logic is so complex, I wasn't able to understand
it, unfortunately.
I've reviewed almost everything, see comments below.
But not the Recovery_context methods. Please explain how it works and
how all these truncate_validated, truncate_reset_done,
truncate_set_in_1st, etc all work together.
On Feb 26, Sujatha wrote:
> revision-id: ee538938345 (mariadb-10.3.26-68-gee538938345)
> parent(s): 7d04ce6a2d4
> author: Sujatha <sujatha.sivakumar(a)mariadb.com>
> committer: Andrei Elkin <andrei.elkin(a)mariadb.com>
> timestamp: 2021-02-08 17:58:03 +0200
> message:
>
> MDEV-21117: refine the server binlog-based recovery for semisync
>
this should be MDEV subject. But if you like this better
you can rename the MDEV instead
>
> Problem:
> =======
> When the semisync master is crashed and restarted as slave it could
> recover transactions that former slaves may never have seen.
> A known method existed to clear out all prepared transactions
> with --tc-heuristic-recover=rollback does not care to adjust
> binlog accordingly.
>
> Fix:
> ===
> The binlog-based recovery is made to concern of the slave semisync role of
> post-crash restarted server.
> No changes in behaviour is done to the "normal" binloggging server
> and the semisync master.
>
> When the restarted server is configured with
> --rpl-semi-sync-slave-enabled=1
> the refined recovery attempts to roll back prepared transactions
> and truncate binlog accordingly.
> In case of a partically committed (that is committed at least
>
partially
>
> in one of the engine participants) such transaction gets committed.
> It's guaranteed no (partially as well) committed transactions
> exist beyond the truncate position.
> In case there exists a non-transactional replication event
> (being in a way a committed transaction) past the
> computed truncate position the recovery ends with an error.
>
> To facilite the failover on the slave side
facilitate
> conditions to accept own events (having been discarded by the above recovery)
> are relaxed to let so for the semisync slave that connects to master
> in gtid mode. gtid_strict_mode is further recommended to secure
> from inadvertent re-applying out of order gtids in general.
> Non-gtid mode connected semisync slave would require
> --replicate-same-server-id (mind --log-slave-updates must be OFF then).
Sorry, I failed to understand this paragraph :(
>
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_active_log.test b/mysql-test/suite/binlog/t/binlog_truncate_active_log.test
> new file mode 100644
> index 00000000000..cf89525dcac
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_active_log.test
> @@ -0,0 +1,76 @@
> +# ==== Purpose ====
> +#
> +# Test verifies the truncation of single binary log file.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Create table t1 and insert/commit a row.
> +# 1 - Insert an another row such that it gets written to binlog but commit
> +# in engine fails as server crashed at this point.
> +# 2 - Restart server with --rpl-semi-sync-slave-enabled=1
> +# 3 - Upon server start 'master-bin.000001' will be truncated to contain
> +# only the first insert
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +--source include/have_innodb.inc
> +--source include/have_aria.inc
> +--source include/have_log_bin.inc
> +--source include/have_debug.inc
> +--source include/have_binlog_format_statement.inc
have_binlog_format_statement obviously implies have_log_bin.
it's redundant to specify it explicitly.
and you don't need have_debug here
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +
> +# The following cases are tested:
> +# A. 2pc transaction is followed by a blank "zero-engines" one
> +# B. 2pc transaction follows the blank one
> +# C. Similarly to A, with the XA blank transaction
> +
> +--connection default
it's connection default by default (not surprisingly) :)
> +RESET MASTER;
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +CREATE TABLE t2 ( f INT ) ENGINE=INNODB;
> +CREATE TABLE tm ( f INT ) ENGINE=Aria;
Why Aria?
I mean, the default "standard" non-transactional engine is MyISAM.
If you use Aria, it means you thought about it and decided to deviate
intentionally from the default. What were your considerations?
> +
> +--echo # Case A.
> +# Both are doomed into truncation.
> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-6
why not simply --let SEARCH_PATTERN = Successfully truncated...etc...
> +--let $query1 = INSERT INTO t VALUES (20)
> +--let $query2 = DELETE FROM t2 WHERE f = 666 /* no such record */
> +--source binlog_truncate_active_log.inc
> +
> +--echo # Case B.
> +# The inverted sequence ends up to truncate only $query2
> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-10
> +--let $query1 = DELETE FROM t2 WHERE f = 0
why not `= 666 /* no such record */` ?
is `= 0` important here?
> +--let $query2 = INSERT INTO t VALUES (20)
> +--source binlog_truncate_active_log.inc
> +
> +
> +delimiter |;
> +CREATE PROCEDURE sp_blank_xa()
> +BEGIN
> + XA START 'blank';
> + DELETE FROM t2 WHERE f = 666 /* no such record */;
> + XA END 'blank';
> + XA PREPARE 'blank';
> +END|
> +delimiter ;|
> +
> +
> +--echo # Case C.
> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-13
> +--let $query1 = INSERT INTO t VALUES (20)
> +--let $pre_q2 = CALL sp_blank_xa
> +--let $query2 = XA COMMIT 'blank'
> +--source binlog_truncate_active_log.inc
> +DROP PROCEDURE sp_blank_xa;
> +
> +--echo # Cleanup
> +DROP TABLE t,t2,tm;
> +
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc b/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc
> new file mode 100644
> index 00000000000..b1ffaf18268
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_active_log.inc
> @@ -0,0 +1,70 @@
you need
source include/have_debug_sync.inc
here, as this file uses DEBUG_SYNC
> +connect(master1,localhost,root,,);
> +connect(master2,localhost,root,,);
> +connect(master3,localhost,root,,);
just FYI (no need to change anything),
you know that parentheses and extra commas are optional, right?
that is, it could be written as
connect master1,localhost,root;
connect master2,localhost,root;
connect master3,localhost,root;
> +
> +--connection default
> +
> +# First to commit few transactions
> +INSERT INTO t VALUES (10);
> +INSERT INTO tm VALUES (10);
> +
> +--connection master1
> +# Hold insert after write to binlog and before "run_commit_ordered" in engine
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master1_ready WAIT_FOR signal_never_arrives";
> +--send_eval $query1
> +
> +--connection master2
> +SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
> +if ($pre_q2)
is $pre_q2 a flag or a query string?
it's used as a flag, it's set as a query string.
> +{
> + CALL sp_blank_xa;
> +}
> +SET DEBUG_SYNC= "commit_before_get_LOCK_after_binlog_sync SIGNAL master2_ready";
> +# To binlog non-xid transactional group which will be truncated all right
> +--send_eval $query2
> +
> +
> +--connection master3
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +SELECT @@global.gtid_binlog_pos as 'Before the crash';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
> +
> +--source include/kill_mysqld.inc
> +--source include/wait_until_disconnected.inc
you know that kill_mysqld both writes 'wait' to the expect file
and includes wait_until_disconnected ?
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart: --rpl-semi-sync-slave-enabled=1
> +EOF
why "append_file" and not "write_file" ?
and why wait, you could've written restart right away, couldn't you?
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# Check error log for a successful truncate message.
> +let $log_error_= `SELECT @@GLOBAL.log_error`;
> +if(!$log_error_)
> +{
> + # MySQL Server on windows is started with --console and thus
> + # does not know the location of its .err log, use default location
> + let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err;
isn't it always $MYSQLTEST_VARDIR/log/mysqld.1.err?
other tests don't do it conditionally
> +}
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_RANGE=-50000
if you're searching in the error log - and you are -
it's best not to set the SEARCH_RANGE at all. Then it'll search
from the last CURRENT_TEST: label. That is, it'll search in all
error log entries generated by the current test and not in the
error log that came from earlier tests.
> +--let SEARCH_PATTERN=$this_search_pattern
> +--replace_regex /FOUND [0-9]+/FOUND #/
can it be found multiple times? Why would binlog be truncated more than once?
> +--source include/search_pattern_in_file.inc
> +
> +SELECT @@global.gtid_binlog_pos as 'After the crash';
> +--echo "One row should be present in table 't'"
> +SELECT COUNT(*) as 'One' FROM t;
Do you know which one? If yes, why not `SELECT * FROM t` ?
> +
> +# Local cleanup
> +DELETE FROM t;
> +--disconnect master1
> +--disconnect master2
> +--disconnect master3
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> new file mode 100644
> index 00000000000..fe153e5703c
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.test
> @@ -0,0 +1,78 @@
> +# ==== Purpose ====
> +#
> +# Test verifies truncation of multiple binary logs.
and "with multiple transactional storage engines" ?
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Create two tables in innodb and rocksdb engines,
> +#
> +# In loop for A,B,C cases (below) do 1-5:
there's no loop now (which is good, don't add it please :)
> +# 1 - execute FLUSH LOGS command to generate a new binary log.
> +# Start a transaction inserting rows of sufficient sizes
> +# so binary log gets rotated at commit
> +# 2 - Using debug simulation make the server crash at a point where
> +# the transaction is written to binary log *and* either of
> +# A. neither of them commits
> +# B. only one commits
> +# C. both commit
> +# 3 - print the # of binlog files before the transaction starts and after its
> +# commit is submitted
> +# 4 - Restart server with --tc-heuristic-recover=BINLOG_TRUNCATE
outdated comment?
> +# 5 - Restart normally to print post recovery status.
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +
> +--source include/have_innodb.inc
> +--source include/have_rocksdb.inc
> +--source include/have_log_bin.inc
this is redundant
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +--let $old_max_binlog_size= `select @@global.max_binlog_size`
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +--let $MYSQLD_DATADIR= `SELECT @@datadir`
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE t2 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=rocksdb;
> +
> +--let $case = A: neither engine committed => rollback & binlog truncate
> +# Hold off engine commits after write to binlog and its rotation.
> +# The transaction is killed along with the server after that.
> +--let $kill_server=1
> +--let $debug_sync_action = "commit_after_release_LOCK_log SIGNAL con1_ready WAIT_FOR signal_no_signal"
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1
> +--let $test_outcome= 1 row should be present in both tables; binlog is truncated; number of binlogs at reconnect - 3
> + --source binlog_truncate_multi_engine.inc
> +--echo Proof of the truncated binlog file is readable (two transactions must be seen):
> +--let $MYSQLD_DATADIR = `select @@datadir`
no need to set the $MYSQLD_DATADIR twice
> +--exec $MYSQL_BINLOG --short-form --skip-annotate-row-events $MYSQLD_DATADIR/master-bin.000002
> +
> +--let $case = B: one engine has committed its transaction branch
> +# Hold off after one engine has committed.
> +--let $kill_server=1
> +--let $debug_sync_action = "commit_after_run_commit_ordered SIGNAL con1_ready WAIT_FOR signal_no_signal"
> +--let $restart_simulate_partial_commit = 1
this variable seems to be unused
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1 --debug-dbug=d,binlog_truncate_partial_commit
this seems to be a rather crude way of faking a partially committed
transaction. better to crash after the first engine has committed,
that'd be much more natural.
> +--let $test_outcome= 2 rows should be present in both tables; no binlog truncation; one extra binlog file compare with A; number of binlogs at reconnect - 4
> + --source binlog_truncate_multi_engine.inc
> +
> +--let $case = C: both engines have committed its transaction branch
you didn't do --let $debug_sync_action = "reset", so the old value
is used. Intentional?
> +# Hold off after both engines have committed. The server is shut down.
> +--let $kill_server=0
> +--let $restart_parameters = --rpl-semi-sync-slave-enabled=1
> +--let $test_outcome= 2 rows should be present in both tables; no binlog truncation; the same # of binlog files as in B; number of binlogs at reconnect - 4
> + --source binlog_truncate_multi_engine.inc
> +
> +
> +
> +DROP TABLE t1, t2;
> +--replace_result $old_max_binlog_size VALUE_AT_START
> +--eval SET @@global.max_binlog_size = $old_max_binlog_size
better use evalp instead of eval, then you won't need replace_result
but really I think you don't need to save/restore max_binlog_size at all,
because you restart the server anyway.
> +
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc
> new file mode 100644
> index 00000000000..c260a7987e2
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_engine.inc
> @@ -0,0 +1,64 @@
> +#
> +# Loop body of binlog_truncate_multi_engine.test
> +# Parameters:
> +# $debug_sync_action describes debug-sync actions
> +# $kill_server 1 when to crash, 0 for regular restart
> +# $restart_parameters the caller may simulate partial commit at recovery
> +# $test_outcome summary of extected results
> +# $MYSQLD_DATADIR
> +
> +--echo #
> +--echo #
> +--echo # Case $case
> +--echo #
> +RESET MASTER;
> +FLUSH LOGS;
> +SET GLOBAL max_binlog_size= 4096;
> +
> +connect(con1,localhost,root,,);
> +#--connection con1
> +--echo List of binary logs before rotation
> +--source include/show_binary_logs.inc
> +INSERT INTO t1 VALUES (1, REPEAT("x", 1));
> +INSERT INTO t2 VALUES (1, REPEAT("x", 1));
> +BEGIN;
> + INSERT INTO t1 VALUES (2, REPEAT("x", 4100));
> + INSERT INTO t2 VALUES (2, REPEAT("x", 4100));
> +
> +--eval SET DEBUG_SYNC= $debug_sync_action
> +send COMMIT;
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +--echo List of binary logs after rotation
> +--source include/show_binary_logs.inc
> +
> +--echo # restart the server with $restart_parameters
> +if ($kill_server)
> +{
> + --echo # the server is crashed
> + --source include/kill_mysqld.inc
> + --source include/start_mysqld.inc
> +}
> +if (!$kill_server)
> +{
> + --echo # the server is restarted
> + --source include/restart_mysqld.inc
it'd be simpler not to use $kill_server at all. And instead write
let $shutdown_timeout=0;
instead of $kill_server=1;
and
let $shutdown_timeout=;
instead of $kill_server=0;
and above you won't need two if's, you can just do
source include/restart_mysqld.inc;
> +}
> +
> +--connection default
> +--echo #
> +--echo # *** Summary: $test_outcome:
> +--echo #
> +SELECT COUNT(*) FROM t1;
> +SELECT COUNT(*) FROM t2;
> +SELECT @@GLOBAL.gtid_binlog_state;
> +SELECT @@GLOBAL.gtid_binlog_pos;
> +--echo List of binary logs at the end of the tests
> +--source include/show_binary_logs.inc
> +--echo # ***
> +# cleanup
> +DELETE FROM t1;
> +DELETE FROM t2;
> +--disconnect con1
> +--echo #
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test
> new file mode 100644
> index 00000000000..231e90dbdc9
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_log.test
> @@ -0,0 +1,105 @@
> +# ==== Purpose ====
> +#
> +# Test verifies truncation of multiple binary logs.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 1 - Set max_binlog_size= 4096, to help a series of inserts into a
> +# transaction table 'ti' get binlog rotated so many time while the
> +# transactions won't be committed, being stopped at
> +# a prior to commit debug_sync point
> +# 2 - kill and restart the server as semisync slave successfully to
> +# end with an expected first binlog and the gtid state.
> +#
> +# ==== References ====
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +--source include/have_innodb.inc
> +--source include/have_log_bin.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +
> +--let $old_max_binlog_size= `select @@global.max_binlog_size`
> +SET @@global.max_binlog_size= 4096;
> +
> +RESET MASTER;
> +FLUSH LOGS;
> +CREATE TABLE ti (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE tm (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=MyISAM;
> +
> +connect(master1,localhost,root,,);
> +--echo "List of binary logs before rotation"
> +--source include/show_binary_logs.inc
> +
> +# Some load to either non- and transactional egines
> +# that should not affect the following recovery:
> +INSERT INTO ti VALUES(1,"I am gonna survive");
> +INSERT INTO tm VALUES(1,"me too!");
> +
> +# hold on near engine commit
> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL master1_ready WAIT_FOR con1_go";
> +--send_eval INSERT INTO ti VALUES (2, REPEAT("x", 4100))
> +
> +connect(master2,localhost,root,,);
> +# The 2nd trx for recovery, it does not rotate binlog
> +SET DEBUG_SYNC= "now WAIT_FOR master1_ready";
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master2_ready WAIT_FOR master2_go";
> +--send_eval INSERT INTO ti VALUES (3, "not gonna survive")
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +--echo "List of binary logs before crash"
> +--source include/show_binary_logs.inc
> +--echo # The gtid binlog state prior the crash will be truncated at the end of the test
> +SELECT @@global.gtid_binlog_state;
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
> +
> +--source include/kill_mysqld.inc
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart: --rpl-semi-sync-slave-enabled=1
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# Check error log for a successful truncate message.
> +let $log_error_= `SELECT @@GLOBAL.log_error`;
> +if(!$log_error_)
> +{
> + # MySQL Server on windows is started with --console and thus
> + # does not know the location of its .err log, use default location
> + let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err;
> +}
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_RANGE=-50000
> +--let SEARCH_PATTERN=truncated binlog file:.*master.*000002
> +--replace_regex /FOUND [0-9]+/FOUND #/
> +--source include/search_pattern_in_file.inc
> +
> +
> +--echo "One record should be present in table"
> +SELECT count(*) FROM ti;
> +
> +--echo # The truncated gtid binlog state
> +SELECT @@global.gtid_binlog_state;
> +SELECT @@global.gtid_binlog_pos;
> +
> +--echo # Cleanup
> +--replace_result $old_max_binlog_size VALUE_AT_START
> +--eval SET @@global.max_binlog_size = $old_max_binlog_size
> +DROP TABLE ti;
> +
many of my comments above apply to this test too
> +--echo # End of the tests
> diff --git a/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test b/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test
> new file mode 100644
> index 00000000000..6aba2935fde
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_truncate_multi_log_unsafe.test
> @@ -0,0 +1,142 @@
> +# ==== Purpose ====
> +# The test verifies attempt to recover by the semisync slave server whose
> +# binlog is unsafe for truncation.
> +#
> +# ==== Implementation ====
> +# 2 binlog files are created with the 1st one destined to be the binlog
> +# checkpoint file for recovery.
> +# The final group of events is replication unsafe (myisam INSERT).
> +# Therefore the semisync slave recovery may not.
> +#
> +# Steps:
> +# 0 - Set max_binlog_size= 4096, to help an insert into a
> +# transaction table 'ti' get binlog rotated while the
> +# transaction won't be committed, being stopped at
> +# a prior to commit debug_sync point
> +# 1 - insert into a non-transactional 'tm' table completes with
> +# binary logging as well
> +# 2 - kill and attempt to restart the server as semisync slave that
> +# must produce an expected unsafe-to-recover error
> +# 3 - complete the test with a normal restart that successfully finds and
> +# commits the transaction in doubt.
> +#
> +# ==== References ====
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +
> +--source include/have_innodb.inc
> +--source include/have_log_bin.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +
> +--let $old_max_binlog_size= `select @@global.max_binlog_size`
> +SET @@global.max_binlog_size= 4096;
> +
> +call mtr.add_suppression("Table '.*tm' is marked as crashed and should be repaired");
> +call mtr.add_suppression("Got an error from unknown thread");
> +call mtr.add_suppression("Checking table: '.*tm'");
> +call mtr.add_suppression("Recovering table: '.*tm'");
> +call mtr.add_suppression("Cannot trim the binary log to file");
> +call mtr.add_suppression("Crash recovery failed");
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +call mtr.add_suppression("Found 1 prepared transactions");
> +call mtr.add_suppression("mysqld: Table.*tm.*is marked as crashed");
> +call mtr.add_suppression("Checking table.*tm");
> +
> +RESET MASTER;
> +FLUSH LOGS;
> +CREATE TABLE ti (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CREATE TABLE tm (f INT) ENGINE=MYISAM;
> +
> +--let $row_count = 5
> +--let $i = 3
> +--disable_query_log
> +while ($i)
> +{
> + --eval INSERT INTO ti VALUES ($i, REPEAT("x", 1))
> +--dec $i
> +}
> +--enable_query_log
> +INSERT INTO tm VALUES(1);
> +
> +connect(master1,localhost,root,,);
> +connect(master2,localhost,root,,);
> +connect(master3,localhost,root,,);
> +
> +--connection master1
> +
> +# The 1st trx binlogs, rotate binlog and hold on before committing at engine
> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL master1_ready WAIT_FOR master1_go";
you don't wait for master1_ready anywhere. intentional?
> +--send_eval INSERT INTO ti VALUES ($row_count - 1, REPEAT("x", 4100))
if you plan to insert rows 1,2,3,4,5 then your $i=3 and $row_count=5
are not independent, you should use $i=$row_count-2.
(probably $i=`select $row_count-2`)
or don't pretent this test is tunable and use literals instead of variables.
> +
> +--connection master2
> +
> +# The 2nd trx for recovery, it does not rotate binlog
> +SET DEBUG_SYNC= "commit_before_get_LOCK_commit_ordered SIGNAL master2_ready WAIT_FOR master2_go";
> +--send_eval INSERT INTO ti VALUES ($row_count, REPEAT("x", 1))
> +
> +--connection master3
> +SET DEBUG_SYNC= "now WAIT_FOR master2_ready";
> +SET DEBUG_SYNC= "commit_before_get_LOCK_after_binlog_sync SIGNAL master3_ready";
> +--send INSERT INTO tm VALUES (2)
> +
> +--connection default
> +SET DEBUG_SYNC= "now WAIT_FOR master3_ready";
> +--echo # The gtid binlog state prior the crash must be restored at the end of the testSELECT @@global.gtid_binlog_state;
eh? Was this SELECT supposed to be on a separate line?
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
> +
> +SELECT @@global.gtid_binlog_state;
> +--source include/kill_mysqld.inc
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restarts
> +#
> +--echo # Failed restart as the semisync slave
> +--error 1
> +--exec $MYSQLD_LAST_CMD --rpl-semi-sync-slave-enabled=1 >> $MYSQLTEST_VARDIR/log/mysqld.1.err 2>&1
> +
> +--echo # Normal restart
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart:
Okay. *this* is a reasonable use of wait/restart in the expect file.
In other cases, I think, you can just restart, no need to wait,
as you don't do anything while the server is down.
still, here you don't need to write 'wait' to the expect file
and don't need to wait_until_disconnected (kill_mysqld is doing both)
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# Check error log for correct messages.
> +let $log_error_= `SELECT @@GLOBAL.log_error`;
> +if(!$log_error_)
> +{
> + # MySQL Server on windows is started with --console and thus
> + # does not know the location of its .err log, use default location
> + let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.1.err;
> +}
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_RANGE=-50000
> +--let SEARCH_PATTERN=Cannot trim the binary log to file
> +--replace_regex /FOUND [0-9]+/FOUND #/
> +--source include/search_pattern_in_file.inc
> +
> +--echo # Proof that the in-doubt transactions are recovered by the 2nd normal server restart
> +--eval SELECT COUNT(*) = $row_count as 'True' FROM ti
> +# myisam table may require repair (which is not tested here)
> +--disable_warnings
> +SELECT COUNT(*) <= 1 FROM tm;
> +--enable_warnings
> +
> +--echo # The gtid binlog state prior the crash is restored now
> +SELECT @@GLOBAL.gtid_binlog_state;
> +SELECT @@GLOBAL.gtid_binlog_pos;
> +
> +--echo # Cleanup
> +--replace_result $old_max_binlog_size VALUE_AT_START
> +--eval SET @@global.max_binlog_size = $old_max_binlog_size
see comments elsewhere about $old_max_binlog_size, -50000, $log_error_, etc
> +DROP TABLE ti, tm;
> +--echo End of test
> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf
> new file mode 100644
> index 00000000000..3518eb95b67
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.cnf
> @@ -0,0 +1,13 @@
> +!include suite/rpl/rpl_1slave_base.cnf
> +!include include/default_client.cnf
> +
> +
> +[mysqld.1]
> +log-slave-updates
> +gtid-strict-mode=1
> +max_binlog_size= 4096
> +
> +[mysqld.2]
> +log-slave-updates
> +gtid-strict-mode=1
> +max_binlog_size= 4096
why not an .opt file?
why max_binlog_size here and not at run-time like in other tests?
> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> new file mode 100644
> index 00000000000..972aaf2c8b4
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_fail_over.test
> @@ -0,0 +1,144 @@
> +# ==== Purpose ====
> +#
> +# Test verifies replication failover scenario.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Having two servers 1 and 2 enable semi-sync replication with
> +# with the master wait 'after_sync'.
> +# 1 - Insert a row. While inserting second row simulate
> +# a server crash at once the transaction is written to binlog, flushed
> +# and synced but the binlog position is not updated.
> +# 2 - Post crash-recovery on the old master execute there CHANGE MASTER
> +# TO command to connect to server id 2.
> +# 3 - The old master new slave server 1 must connect to the new
> +# master server 2.
> +# 4 - repeat the above to crash the new master and restore in role the old one
> +#
> +# ==== References ====
> +#
> +# MDEV-21117: recovery for --rpl-semi-sync-slave-enabled server
> +
> +
> +--source include/have_innodb.inc
> +--source include/have_log_bin.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_binlog_format_row.inc
> +--let $rpl_topology=1->2
> +--source include/rpl_init.inc
isn't this a normal master-slave.inc topology?
> +
> +--connection server_2
> +--source include/stop_slave.inc
> +
> +--connection server_1
> +RESET MASTER;
> +
> +--connection server_2
> +RESET MASTER;
> +set @@global.rpl_semi_sync_slave_enabled = 1;
> +set @@global.gtid_slave_pos = "";
> +CHANGE MASTER TO master_use_gtid= slave_pos;
> +--source include/start_slave.inc
> +
> +
> +--connection server_1
> +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
why?
> +set @@global.rpl_semi_sync_master_enabled = 1;
> +set @@global.rpl_semi_sync_master_wait_point=AFTER_SYNC;
> +
> +call mtr.add_suppression("Can.t init tc log");
> +call mtr.add_suppression("Aborting");
> +call mtr.add_suppression("1 client is using or hasn.t closed the table properly");
> +call mtr.add_suppression("Table './mtr/test_suppressions' is marked as crashed and should be repaired");
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +INSERT INTO t1 VALUES (1, 'dummy1');
> +
> +#
> +# CRASH the original master, and FAILOVER to the new
> +#
> +
> +# value 1 for server id 1 -> 2 failover
> +--let $failover_to_slave=1
> +--let $query_to_crash= INSERT INTO t1 VALUES (2, REPEAT("x", 4100))
> +--let $log_search_pattern=truncated binlog file:.*master.*000001
> +--source rpl_semi_sync_crash.inc
> +
> +--connection server_2
> +--let $rows_so_far=3
> +--eval INSERT INTO t1 VALUES ($rows_so_far, 'dummy3')
> +--save_master_pos
> +--echo # The gtid state on current master must be equal to ...
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_1
> +--sync_with_master
> +--eval SELECT COUNT(*) = $rows_so_far as 'true' FROM t1
> +--echo # ... the gtid states on the slave:
> +SHOW VARIABLES LIKE 'gtid_slave_pos';
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_2
> +#
> +# CRASH the new master and FAILOVER back to the original
> +#
> +
> +# value 0 for the reverse server id 2 -> 1 failover
> +--let $failover_to_slave=0
> +--let $query_to_crash = INSERT INTO t1 VALUES (4, REPEAT("x", 4100))
> +--let $query2_to_crash= INSERT INTO t1 VALUES (5, REPEAT("x", 4100))
> +--let $log_search_pattern=truncated binlog file:.*slave.*000001
> +--source rpl_semi_sync_crash.inc
> +
> +--connection server_1
> +--let $rows_so_far=6
> +--eval INSERT INTO t1 VALUES ($rows_so_far, 'Done')
> +--save_master_pos
> +--echo # The gtid state on current master must be equal to ...
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +--connection server_2
> +--sync_with_master
> +--eval SELECT COUNT(*) = $rows_so_far as 'true' FROM t1
> +--echo # ... the gtid states on the slave:
> +SHOW VARIABLES LIKE 'gtid_slave_pos';
> +SHOW VARIABLES LIKE 'gtid_binlog_pos';
> +
> +
> +--let $diff_tables=server_1:t1, server_2:t1
> +--source include/diff_tables.inc
> +
> +#
> +--echo # Cleanup
> +#
> +--connection server_1
> +DROP TABLE t1;
> +--save_master_pos
> +
> +--connection server_2
> +--sync_with_master
> +--source include/stop_slave.inc
> +
> +--connection server_1
> +set @@global.rpl_semi_sync_master_enabled = 0;
> +set @@global.rpl_semi_sync_slave_enabled = 0;
> +set @@global.rpl_semi_sync_master_wait_point=default;
> +RESET SLAVE;
> +RESET MASTER;
> +
> +--connection server_2
> +set @@global.rpl_semi_sync_master_enabled = 0;
> +set @@global.rpl_semi_sync_slave_enabled = 0;
> +set @@global.rpl_semi_sync_master_wait_point=default;
> +
> +--replace_result $SERVER_MYPORT_1 SERVER_MYPORT_1
> +eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_use_gtid=no;
evalp
> +--source include/start_slave.inc
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_semi_sync_crash.inc b/mysql-test/suite/rpl/t/rpl_semi_sync_crash.inc
> new file mode 100644
> index 00000000000..4289df9155f
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_semi_sync_crash.inc
> @@ -0,0 +1,87 @@
> +if ($failover_to_slave)
> +{
> + --let $server_to_crash=1
> + --let $server_to_promote=2
> + --let $new_master_port=$SERVER_MYPORT_2
> + --let $client_port=$SERVER_MYPORT_1
> +
> + --connect (conn_client,127.0.0.1,root,,test,$SERVER_MYPORT_1,)
> +}
> +if (!$failover_to_slave)
> +{
> + --let $server_to_crash=2
> + --let $server_to_promote=1
> + --let $new_master_port=$SERVER_MYPORT_1
> + --let $client_port=$SERVER_MYPORT_2
> +
> + --connect (conn_client,127.0.0.1,root,,test,$SERVER_MYPORT_2,)
> +}
> +
> +
> +# Hold insert after write to binlog and before "run_commit_ordered" in engine
> +
> +SET DEBUG_SYNC= "commit_after_release_LOCK_after_binlog_sync SIGNAL con1_ready WAIT_FOR con1_go";
> +--send_eval $query_to_crash
> +
> +# complicate recovery with an extra binlog file
> +if (!$failover_to_slave)
> +{
> + --connect (conn_client_2,127.0.0.1,root,,test,$SERVER_MYPORT_2,)
> + # use the same signal with $query_to_crash
> + SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> + SET DEBUG_SYNC= "commit_after_release_LOCK_lock SIGNAL con1_ready WAIT_FOR con2_go";
both signal con1_ready? which one will you be waiting below then?
> + --send_eval $query2_to_crash
> +}
> +
> +--connection server_$server_to_crash
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +--source include/kill_mysqld.inc
> +--source include/wait_until_disconnected.inc
> +
> +--connection server_$server_to_promote
> +--error 2003
> +--source include/stop_slave.inc
> +SELECT @@GLOBAL.gtid_current_pos;
> +
> +--let $_expect_file_name=$MYSQLTEST_VARDIR/tmp/mysqld.$server_to_crash.expect
wasn't $_expect_file_name already set above?
> +--let $restart_parameters=--rpl-semi-sync-slave-enabled=1
> +--let $allow_rpl_inited=1
> +--source include/start_mysqld.inc
> +
> +--connection server_$server_to_crash
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# Check error log for correct messages.
> +let $log_error_= `SELECT @@GLOBAL.log_error`;
> +if(!$log_error_)
> +{
> + # MySQL Server on windows is started with --console and thus
> + # does not know the location of its .err log, use default location
> + let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.$server_to_crash.err;
> +}
> +--let SEARCH_FILE=$log_error_
> +--let SEARCH_RANGE=-50000
> +--let SEARCH_PATTERN=$log_search_pattern
> +--source include/search_pattern_in_file.inc
> +
> +--disconnect conn_client
> +
> +#
> +# FAIL OVER now to new master
> +#
> +--connection server_$server_to_promote
> +set global rpl_semi_sync_master_enabled = 1;
> +set global rpl_semi_sync_master_wait_point=AFTER_SYNC;
> +
> +--connection server_$server_to_crash
> +--let $master_port=$SERVER_MYPORT_2
> +if (`select $server_to_crash = 2`)
> +{
> + --let $master_port=$SERVER_MYPORT_1
> +}
> +--replace_result $SERVER_MYPORT_1 SERVER_MYPORT_1 $SERVER_MYPORT_2 SERVER_MYPORT_2
> +eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$new_master_port, master_user='root', master_use_gtid=SLAVE_POS;
> +set global rpl_semi_sync_slave_enabled = 1;
> +set @@global.gtid_slave_pos=@@global.gtid_binlog_pos;
> +--source include/start_slave.inc
see comments about -50000, $log_error_, wait_until_disconnected, evalp, etc
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index 140394fefc1..87fa88d4c89 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -4632,7 +4632,7 @@ class THD :public Statement,
> LF_PINS *tdc_hash_pins;
> LF_PINS *xid_hash_pins;
> bool fix_xid_hash_pins();
> -
> + bool is_1pc_ro_trans;
I don't like how it looks, it's in THD (for every single connection and more)
and it's only used in some exotic cases. It's set in two places around
ha_commit_one_phase(), when it would be better to set it inside
ha_commit_one_phase(), in one place only.
But I cannot suggest how to get rid of it yet as I don't understand what it's
for, I've asked a question about it below, in Gtid_log_event::Gtid_log_event
> /* Members related to temporary tables. */
> public:
> /* Opened table states. */
> diff --git a/sql/handler.h b/sql/handler.h
> index 2a346e8d9d1..bd39f46bf1f 100644
> --- a/sql/handler.h
> +++ b/sql/handler.h
> @@ -873,6 +873,14 @@ typedef struct xid_t XID;
> /* The 'buf' has to have space for at least SQL_XIDSIZE bytes. */
> uint get_sql_xid(XID *xid, char *buf);
>
> +/* struct for heuristic binlog truncate recovery */
it's not "heuristic binlog truncate recovery" anymore,
please don't call it that, it's confusing
> +struct xid_recovery_member
> +{
> + my_xid xid;
> + uint in_engine_prepare; // number of engines that have xid prepared
> + bool decided_to_commit;
> +};
> +
> /* for recover() handlerton call */
> #define MIN_XID_LIST_SIZE 128
> #define MAX_XID_LIST_SIZE (1024*128)
> diff --git a/sql/slave.cc b/sql/slave.cc
> index 28e08e32346..9b3f73e5341 100644
> --- a/sql/slave.cc
> +++ b/sql/slave.cc
> @@ -6944,7 +6945,9 @@ static int queue_event(Master_info* mi,const char* buf, ulong event_len)
> }
> else
> if ((s_id == global_system_variables.server_id &&
> - !mi->rli.replicate_same_server_id) ||
> + (!mi->rli.replicate_same_server_id &&
> + !(semisync_recovery= (rpl_semi_sync_slave_enabled &&
> + mi->using_gtid != Master_info::USE_GTID_NO)))) ||
How can "semisync recovery" code path reach queue_event()?
> event_that_should_be_ignored(buf) ||
> /*
> the following conjunction deals with IGNORE_SERVER_IDS, if set
> diff --git a/sql/log_event.h b/sql/log_event.h
> index 8a342cb5cd3..1588ab85104 100644
> --- a/sql/log_event.h
> +++ b/sql/log_event.h
> @@ -482,6 +482,16 @@ class String;
> */
> #define LOG_EVENT_IGNORABLE_F 0x80
>
> +/**
> + @def LOG_EVENT_ACCEPT_OWN_F
> +
> + Flag sets by the gtid-mode connected semisync slave for
> + the same server_id ("own") events which the slave must not have
> + in its state. Typically such events were never committed by
> + their originator (this server) and discared at its crash recovery
sorry, I failed to understand that
> +*/
> +#define LOG_EVENT_ACCEPT_OWN_F 0x4000
> +
> /**
> @def LOG_EVENT_SKIP_REPLICATION_F
>
> diff --git a/sql/log_event.cc b/sql/log_event.cc
> index e344fc8894f..a593db10d16 100644
> --- a/sql/log_event.cc
> +++ b/sql/log_event.cc
> @@ -7918,6 +7924,7 @@ Gtid_log_event::Gtid_log_event(const char *buf, uint event_len,
> domain_id= uint4korr(buf);
> buf+= 4;
> flags2= *buf;
> + ++buf;
I'd written flags2= *buf++; :)
> if (flags2 & FL_GROUP_COMMIT_ID)
> {
> if (event_len < (uint)header_size + GTID_HEADER_LEN + 2)
> @@ -7925,9 +7932,31 @@ Gtid_log_event::Gtid_log_event(const char *buf, uint event_len,
> seq_no= 0; // So is_valid() returns false
> return;
> }
> - ++buf;
> commit_id= uint8korr(buf);
> + buf+= 8;
> + }
> + /* the extra flags check and actions */
> + if (static_cast<uint>(buf - buf_0) < event_len)
> + {
> + flags_extra= *buf;
> + ++buf;
> + /* extra flags presence is identifed by non-zero byte value at this point */
"extra engines"
> + if (flags_extra & FL_EXTRA_MULTI_ENGINE)
> + {
safety, check that buf+4-buf_0 <= event_len
> + extra_engines= uint4korr(buf);
> + buf += 4;
four bytes for the number of engines participating in a transaction is very
generous... and optimistic. I'd say one byte is more than enough.
> +
> + DBUG_ASSERT(extra_engines > 0);
> + }
> }
> + /*
> + the strict '<' part of the assert corresponds to extra zero-padded
> + trailing bytes,
> + */
> + DBUG_ASSERT(static_cast<uint>(buf - buf_0) <= event_len);
> + /* and the last of them is tested. */
> + DBUG_ASSERT(static_cast<uint>(buf - buf_0) == event_len ||
> + buf_0[event_len - 1] == 0);
> }
>
>
> @@ -7958,6 +7990,22 @@ Gtid_log_event::Gtid_log_event(THD *thd_arg, uint64 seq_no_arg,
> /* Preserve any DDL or WAITED flag in the slave's binlog. */
> if (thd_arg->rgi_slave)
> flags2|= (thd_arg->rgi_slave->gtid_ev_flags2 & (FL_DDL|FL_WAITED));
> + /* count non-zero extra recoverable engines; total = extra + 1 */
> + if (is_transactional)
> + {
> + if (has_xid)
> + {
> + extra_engines=
> + max<uint>(1, ha_count_rw(thd, thd_arg->in_multi_stmt_transaction_mode()))
1. a bit confusing to use both thd and thd_arg in the same line
2. max() is strange. it means the result is 1, when ha_count_rw() is 0.
can ha_count_rw() here even be 0?
3. also it's somewhat an unfounded assumption that only r/w engines will
have the transaction prepared. But we can make it a fact, if we won't
prepare r/o engines at all (in ha_prepare).
> + - 1;
> + }
> + else if (unlikely(thd_arg->is_1pc_ro_trans))
> + {
> + extra_engines= UINT_MAX; // neither extra nor base engine
why is that?
> + }
> + if (extra_engines > 0)
> + flags_extra|= FL_EXTRA_MULTI_ENGINE;
> + }
> }
>
>
> diff --git a/sql/handler.cc b/sql/handler.cc
> index 6792e80b8fe..247ab7267bf 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -1245,6 +1245,24 @@ int ha_prepare(THD *thd)
> DBUG_RETURN(error);
> }
>
> +/*
> + Returns counted number of
> + read-write recoverable transaction participants.
> +*/
> +uint ha_count_rw(THD *thd, bool all)
the name doesn't match the implementation, please, rename
> +{
> + unsigned rw_ha_count= 0;
> + THD_TRANS *trans=all ? &thd->transaction.all : &thd->transaction.stmt;
> +
> + for (Ha_trx_info * ha_info= trans->ha_list; ha_info;
> + ha_info= ha_info->next())
> + {
> + if (ha_info->is_trx_read_write() && ha_info->ht()->recover)
> + ++rw_ha_count;
> + }
> + return rw_ha_count;
> +}
> +
> /**
> Check if we can skip the two-phase commit.
>
> @@ -1960,8 +1982,122 @@ struct xarecover_st
> XID *list;
> HASH *commit_list;
> bool dry_run;
> + MEM_ROOT *mem_root;
> + bool error;
> };
>
> +/**
> + Inserts a new hash member.
> +
> + returns a successfully created and inserted @c xid_recovery_member
> + into hash @c hash_arg,
> + or NULL.
> +*/
> +static xid_recovery_member*
> +xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root)
> +{
> + xid_recovery_member *member= (xid_recovery_member*)
> + alloc_root(ptr_mem_root, sizeof(xid_recovery_member));
> + if (!member)
> + return NULL;
> +
> + member->xid= xid_arg;
> + member->in_engine_prepare= 1;
> + member->decided_to_commit= false;
> +
> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> +}
> +
> +/*
> + Inserts a new or updates an existing hash member to increment
> + the member's prepare counter.
> +
> + returns false on success,
> + true otherwise.
> +*/
> +static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,
> + MEM_ROOT *ptr_mem_root)
> +{
> + xid_recovery_member* member;
> + if ((member= (xid_recovery_member *)
> + my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))
> + member->in_engine_prepare++;
> + else
> + member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root);
> +
> + return member == NULL;
> +}
> +
> +/*
> + Hash iterate function to complete with commit or rollback as decided
> + (typically at binlog recovery processing) in member->in_engine_prepare.
> +*/
> +static my_bool xarecover_do_commit_or_rollback(void *member_arg,
> + void *hton_arg)
> +{
> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> + handlerton *hton= (handlerton*) hton_arg;
> + xid_t x;
> + my_bool rc;
> +
> + x.set(member->xid);
> + rc= member->decided_to_commit ? hton->commit_by_xid(hton, &x) :
> + hton->rollback_by_xid(hton, &x);
> +
> + DBUG_ASSERT(rc || member->in_engine_prepare > 0);
> +
> + if (!rc)
> + {
I don't think you can trust rc on that.
if it's non-zero, it's an error all right.
but it's a bit of a stretch to presume that
nonexisting xid is always an error.
also commit_by_xid returns int, not my_bool.
> + member->in_engine_prepare--;
> + if (global_system_variables.log_warnings > 2)
> + sql_print_warning("%s transaction with xid %llu",
> + member->decided_to_commit ?
> + "Committed" : "Rolled back", (ulonglong) member->xid);
> + }
> +
> + return false;
> +}
> +
> +static my_bool xarecover_do_count_in_prepare(void *member_arg,
> + void *ptr_count)
> +{
> + xid_recovery_member *member= (xid_recovery_member*) member_arg;
> + if (member->in_engine_prepare)
> + {
> + *(uint*) ptr_count += member->in_engine_prepare;
This is a rather weird semantics.
it's kind of a number of transactions times number of engines.
if one transaction wasn't committed in two engines and another
transaction wasn't rolled back in one engine, the counter will be 3.
how are you going to explain this to users?
> + if (global_system_variables.log_warnings > 2)
> + sql_print_warning("Found prepared transaction with xid %llu",
> + (ulonglong) member->xid);
> + }
> +
> + return false;
> +}
> +
> +static my_bool xarecover_binlog_truncate_handlerton(THD *unused,
> + plugin_ref plugin,
> + void *arg)
> +{
> + handlerton *hton= plugin_hton(plugin);
> +
> + if (hton->state == SHOW_OPTION_YES && hton->recover)
> + {
> + my_hash_iterate((HASH*) arg, xarecover_do_commit_or_rollback, hton);
Why is the function called xarecover_binlog_truncate_handlerton
if it doesn't truncate anything?
> + }
> +
> + return FALSE;
> +}
> +
> +uint ha_recover_complete(HASH *commit_list)
> +{
> + uint count= 0;
> +
> + plugin_foreach(NULL, xarecover_binlog_truncate_handlerton,
> + MYSQL_STORAGE_ENGINE_PLUGIN, commit_list);
> + my_hash_iterate(commit_list, xarecover_do_count_in_prepare, &count);
> +
> + return count;
> +}
> +
> static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> void *arg)
> {
> @@ -1969,6 +2105,9 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> struct xarecover_st *info= (struct xarecover_st *) arg;
> int got;
>
> + if (info->error)
> + return TRUE;
plugin_foreach() aborts as soon as the callback returns true.
so, remove info->error, and return TRUE from xarecover_handlerton
on error instead.
> +
> if (hton->state == SHOW_OPTION_YES && hton->recover)
> {
> while ((got= hton->recover(hton, info->list, info->len)) > 0 )
> @@ -1988,7 +2127,7 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> }
> #endif /* WITH_WSREP */
>
> - for (int i=0; i < got; i ++)
> + for (int i=0; i < got && !info->error; i ++)
eh? how can info->error ever be true above?
you break out of the loop when setting info->error= true,
> {
> my_xid x= IF_WSREP(WSREP_ON && wsrep_is_wsrep_xid(&info->list[i]) ?
> wsrep_xid_seqno(info->list[i]) :
> @@ -2013,7 +2152,20 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> info->found_my_xids++;
> continue;
> }
> - // recovery mode
> +
> + /*
> + Regular and semisync slave server recovery only collects
> + xids to make decisions on them later by the caller.
> + */
> + if (info->mem_root)
> + {
> + if (xid_member_replace(info->commit_list, x, info->mem_root))
> + {
> + info->error= true;
> + sql_print_error("Error in memory allocation at xarecover_handlerton");
> + break;
> + }
> + } else
add into the else branch something like
/* this branch is only for the legacy TC_LOG_MMAP */
compile_time_assert(sizeof(TC_LOG_MMAP) > 1 );
> if (IF_WSREP((wsrep_emulate_bin_log &&
> wsrep_is_wsrep_xid(info->list + i) &&
> x <= wsrep_limit), false) ||
> diff --git a/sql/log.cc b/sql/log.cc
> index 8073f09ab88..a9808ed8823 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -7977,6 +7982,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
> #endif
> }
>
> + DEBUG_SYNC(leader->thd, "commit_before_update_end_pos");
this isn't used in any tests
> /*
> update binlog_end_pos so it can be read by dump thread
> Note: must be _after_ the RUN_HOOK(after_flush) or else
> @@ -9609,6 +9618,180 @@ int TC_LOG::using_heuristic_recover()
> /****** transaction coordinator log for 2pc - binlog() based solution ******/
> #define TC_LOG_BINLOG MYSQL_BIN_LOG
>
> +/**
> + Truncates the current binlog to specified position. Removes the rest of binlogs
> + which are present after this binlog file.
> +
> + @param truncate_file Holds the binlog name to be truncated
> + @param truncate_pos Position within binlog from where it needs to
> + truncated.
> +
> + @retval true ok
> + @retval false error
> +
> +*/
> +bool MYSQL_BIN_LOG::truncate_and_remove_binlogs(const char *file_name,
> + my_off_t pos,
> + rpl_gtid *ptr_gtid,
> + enum_binlog_checksum_alg cs_alg)
> +{
> + int error= 0;
> +#ifdef HAVE_REPLICATION
> + LOG_INFO log_info;
> + THD *thd= current_thd;
> + my_off_t index_file_offset= 0;
> + File file= -1;
> + IO_CACHE cache;
> + MY_STAT s;
> +
> + if ((error= find_log_pos(&log_info, file_name, 1)))
> + {
> + sql_print_error("Failed to locate binary log file:%s."
> + "Error:%d", file_name, error);
> + goto end;
> + }
> +
> + while (!(error= find_next_log(&log_info, 1)))
> + {
> + if (!index_file_offset)
> + {
> + index_file_offset= log_info.index_file_start_offset;
> + if ((error= open_purge_index_file(TRUE)))
> + {
> + sql_print_error("Failed to open purge index "
> + "file:%s. Error:%d", purge_index_file_name, error);
> + goto end;
> + }
> + }
> + if ((error= register_purge_index_entry(log_info.log_file_name)))
> + {
> + sql_print_error("Failed to copy %s to purge index"
> + " file. Error:%d", log_info.log_file_name, error);
> + goto end;
> + }
> + }
> +
> + if (error != LOG_INFO_EOF)
> + {
> + sql_print_error("Failed to find the next binlog to "
> + "add to purge index register. Error:%d", error);
> + goto end;
> + }
> +
> + if (is_inited_purge_index_file())
> + {
> + if (!index_file_offset)
> + index_file_offset= log_info.index_file_start_offset;
> +
> + if ((error= sync_purge_index_file()))
> + {
> + sql_print_error("Failed to flush purge index "
> + "file. Error:%d", error);
> + goto end;
> + }
> +
> + // Trim index file
> + if ((error=
> + mysql_file_chsize(index_file.file, index_file_offset, '\n',
> + MYF(MY_WME))) ||
> + (error=
> + mysql_file_sync(index_file.file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> + {
> + sql_print_error("Failed to trim binlog index "
> + "file:%s to offset:%llu. Error:%d", index_file_name,
> + index_file_offset, error);
> + goto end;
> + }
> +
> + /* Reset data in old index cache */
> + if ((error= reinit_io_cache(&index_file, READ_CACHE, (my_off_t) 0, 0, 1)))
> + {
> + sql_print_error("Failed to reinit binlog index "
> + "file. Error:%d", error);
> + goto end;
> + }
> +
> + /* Read each entry from purge_index_file and delete the file. */
> + if ((error= purge_index_entry(thd, NULL, TRUE)))
> + {
> + sql_print_error("Failed to process registered "
> + "files that would be purged.");
> + goto end;
> + }
> + }
> +
> + DBUG_ASSERT(pos);
> +
> + if ((file= mysql_file_open(key_file_binlog, file_name,
> + O_RDWR | O_BINARY, MYF(MY_WME))) < 0)
> + {
> + error= 1;
> + sql_print_error("Failed to open binlog file:%s for "
> + "truncation.", file_name);
> + goto end;
> + }
> + my_stat(file_name, &s, MYF(0));
> +
> + /* Change binlog file size to truncate_pos */
> + if ((error=
> + mysql_file_chsize(file, pos, 0, MYF(MY_WME))) ||
> + (error= mysql_file_sync(file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> + {
> + sql_print_error("Failed to trim the "
> + "binlog file:%s to size:%llu. Error:%d",
> + file_name, pos, error);
> + goto end;
> + }
> + else
> + {
> + char buf[21];
> +
> + longlong10_to_str(ptr_gtid->seq_no, buf, 10);
> + sql_print_information("Successfully truncated binlog file:%s "
> + "to pos:%llu to remove transactions starting from "
> + "GTID %u-%u-%s", file_name, pos,
> + ptr_gtid->domain_id, ptr_gtid->server_id, buf);
> + }
> + if (!(error= init_io_cache(&cache, file, IO_SIZE, WRITE_CACHE,
> + (my_off_t) pos, 0, MYF(MY_WME|MY_NABP))))
> + {
> + /*
> + Write Stop_log_event to ensure clean end point for the binary log being
> + truncated.
> + */
I'm not sure about it. The less you touch the corrupted binlog the better.
simply truncating it would be enough, wouldn't it?
> + Stop_log_event se;
> + se.checksum_alg= cs_alg;
> + if ((error= write_event(&se, NULL, &cache)))
> + {
> + sql_print_error("Failed to write stop event to "
> + "binary log. Errno:%d",
> + (cache.error == -1) ? my_errno : error);
> + goto end;
> + }
> + clear_inuse_flag_when_closing(cache.file);
> + if ((error= flush_io_cache(&cache)) ||
> + (error= mysql_file_sync(file, MYF(MY_WME|MY_SYNC_FILESIZE))))
> + {
> + sql_print_error("Faild to write stop event to "
> + "binary log. Errno:%d",
> + (cache.error == -1) ? my_errno : error);
> + }
> + }
> + else
> + sql_print_error("Failed to initialize binary log "
> + "cache for writing stop event. Errno:%d",
> + (cache.error == -1) ? my_errno : error);
1. I don't see why you need to sync after every operation.
2. you should clear inuse flag just before closing, otherwise
flush_io_cache can overwrite it again if it'll happen to have it
in the cache.
> +
> +end:
> + if (file >= 0)
> + {
> + end_io_cache(&cache);
> + mysql_file_close(file, MYF(MY_WME));
> + }
> + error= error || close_purge_index_file();
> +#endif
> + return error > 0;
> +}
> int TC_LOG_BINLOG::open(const char *opt_name)
> {
> int error= 1;
> @@ -10040,26 +10223,515 @@ start_binlog_background_thread()
>
> return 0;
> }
> +#ifdef HAVE_REPLICATION
> +class Recovery_context
> +{
> +public:
> + my_off_t prev_event_pos;
> + rpl_gtid last_gtid;
> + bool last_gtid_standalone;
> + bool last_gtid_valid;
> + bool last_gtid_no2pc; // true when the group does not end with Xid event
> + uint last_gtid_engines;
> + std::pair<uint, my_off_t> last_gtid_coord;
may be a comment at the end // <binlog id, binlog offset> ?
> + /*
> + When true, it's semisync slave recovery mode
> + rolls back transactions in doubt and wipes them off from binlog.
> + The rest of declarations deal with this type of recovery.
> + */
> + bool do_truncate;
> + rpl_gtid binlog_unsafe_gtid, truncate_gtid;
> + char binlog_truncate_file_name[FN_REFLEN] ;
> + char binlog_unsafe_file_name[FN_REFLEN] ;
> + /*
> + When do_truncate is true, the truncate position may not be
> + found in one round when recovered transactions are multi-engine
> + or just on different engines.
> + In the single recoverable engine case `truncate_reset_done` and
> + therefore `truncate_validated` remain `false' when the last
> + binlog is the binlog-checkpoint one.
> + The meaning of `truncate_reset_done` is according to the following example:
> + Let round = 1, Binlog contains the sequence of replication event groups:
> + [g1, G2, g3]
> + where `G` (in capital) stands for committed, `g` for prepared.
> + g1 is first set as truncation candidate, then G2 reset it to indicate
> + the actual truncation is behind (to the right of) it.
> + `truncate_validated` is set to true when `binlog_truncate_pos` (as of g3)
> + won't change.
> + Observe last_gtid_valid is affected, so in the above example g1
> + would have to be discarded from the 1st round binlog state estimate which
> + is handled by the 2nd round (see below).
I failed to understand that either :(
> + */
> + bool truncate_validated; // trued when the truncate position settled
> + bool truncate_reset_done; // trued when the position is to reevaluate
Nor that :(
> + /* Flags the fact of truncate position estimation is done the 1st round */
> + bool truncate_set_in_1st;
> + /*
> + Monotonically indexes binlog files in the recovery list.
> + When the list is "likely" singleton the value is UINT_MAX.
> + Otherwise enumeration starts with zero for the first file, increments
> + by one for any next file except for the last file in the list, which
> + is also the initial binlog file for recovery,
> + that is enumberated with UINT_MAX.
> + */
> + uint id_binlog;
> + enum_binlog_checksum_alg cs_alg; // for Stop_event with do_truncate
rename to checksum_alg, please.
"charset algorithm" looks very puzzling
and "cs" almost everywhere means CHARSET_INFO.
> + bool single_binlog;
> + std::pair<uint, my_off_t> binlog_truncate_coord;
> + std::pair<uint, my_off_t> binlog_unsafe_coord;
> +
> + Recovery_context();
> + bool decide_or_assess(xid_recovery_member *member, int round,
> + Format_description_log_event *fdle,
> + LOG_INFO *linfo, my_off_t pos);
> + void process_gtid(int round, Gtid_log_event *gev, LOG_INFO *linfo);
> + int next_binlog_or_round(int& round,
> + const char *last_log_name,
> + const char *binlog_checkpoint_name,
> + LOG_INFO *linfo, MYSQL_BIN_LOG *log);
> + bool is_safe()
what do you mean "safe" ?
> + {
> + return !do_truncate ? true :
> + (truncate_gtid.seq_no == 0 || // no truncate
> + binlog_unsafe_coord < binlog_truncate_coord); // or unsafe is earlier
> + }
> + bool complete(MYSQL_BIN_LOG *log, HASH &xids);
> + void update_binlog_unsafe_coord_if_needed(LOG_INFO *linfo);
> + void reset_truncate_coord(my_off_t pos);
> + void set_truncate_coord(LOG_INFO *linfo, int round,
> + enum_binlog_checksum_alg fd_cs_alg);
> +};
>
comments about what every method does?
> +bool Recovery_context::complete(MYSQL_BIN_LOG *log, HASH &xids)
> +{
> + if (!do_truncate || is_safe())
> + {
> + uint count_in_prepare= ha_recover_complete(&xids);
> + if (count_in_prepare > 0 && global_system_variables.log_warnings > 2)
> + {
> + sql_print_warning("Could not complete %u number of transactions in "
> + "engines.", count_in_prepare);
> + return false; // there's later dry run ha_recover() to error out
> + }
> + }
> +
> + /* Truncation is not done when there's no transaction to roll back */
> + if (do_truncate && truncate_gtid.seq_no > 0)
> + {
> + if (is_safe())
> + {
> + if (log->truncate_and_remove_binlogs(binlog_truncate_file_name,
> + binlog_truncate_coord.second,
> + &truncate_gtid,
> + cs_alg))
> + {
> + sql_print_error("Failed to trim the binary log to "
> + "file:%s pos:%llu.", binlog_truncate_file_name,
> + binlog_truncate_coord.second);
> + return true;
> + }
> + }
> + else
> + {
> + sql_print_error("Cannot trim the binary log to file:%s "
> + "pos:%llu as unsafe statement "
> + "is found at file:%s pos:%llu which is "
> + "beyond the truncation position "
> + "(the farest in binlog order only is reported); "
"farest" doesn't look like a word
> + "all transactions in doubt are left intact. ",
> + binlog_truncate_file_name, binlog_truncate_coord.second,
> + binlog_unsafe_file_name, binlog_unsafe_coord.second);
> + return true;
will this abort server startup?
> + }
> + }
> +
> + return false;
> +}
> +
> +Recovery_context::Recovery_context() :
> + prev_event_pos(0),
> + last_gtid_standalone(false), last_gtid_valid(false), last_gtid_no2pc(false),
> + last_gtid_engines(0),
> + do_truncate(rpl_semi_sync_slave_enabled),
> + truncate_validated(false), truncate_reset_done(false),
> + truncate_set_in_1st(false), id_binlog(UINT_MAX),
> + cs_alg(BINLOG_CHECKSUM_ALG_UNDEF), single_binlog(false)
> +{
> + last_gtid_coord= std::pair<uint,my_off_t>(0,0);
> + binlog_truncate_coord= std::pair<uint,my_off_t>(0,0);
> + binlog_unsafe_coord= std::pair<uint,my_off_t>(0,0);
> + binlog_truncate_file_name[0]= 0;
> + binlog_unsafe_file_name [0]= 0;
> + binlog_unsafe_gtid= truncate_gtid= rpl_gtid();
> +}
> +
> +/**
> + Is called when a committed or to-be-committed transaction is detected.
> + @c truncate_gtid is set to "nil" with its @c rpl_gtid::seq_no := 0.
> + @c truncate_reset_done remembers the fact of that has been done at least
> + once in the current round;
> + @c binlog_truncate_coord is "suggested" to a next group start to indicate
> + the actual settled value must be at most as the last suggested one.
> +*/
> +void Recovery_context::reset_truncate_coord(my_off_t pos)
> +{
> + DBUG_ASSERT(binlog_truncate_coord.second == 0 ||
> + last_gtid_coord >= binlog_truncate_coord ||
> + truncate_set_in_1st);
> +
> + binlog_truncate_coord= std::pair<uint,my_off_t>(id_binlog, pos);
> + truncate_gtid= rpl_gtid();
> + truncate_reset_done= true;
> +}
> +
> +
> +/*
> + Sets binlog_truncate_pos to the value of the current transaction's gtid.
> + In multi-engine case that might be just an assessment to be exacted
> + in the current round and confirmed in a next one.
> +*/
> +void Recovery_context::set_truncate_coord(LOG_INFO *linfo, int round,
> + enum_binlog_checksum_alg fd_cs_alg)
> +{
> + binlog_truncate_coord= last_gtid_coord;
> + strmake_buf(binlog_truncate_file_name, linfo->log_file_name);
> +
> + truncate_gtid= last_gtid;
> + cs_alg= fd_cs_alg;
> + truncate_set_in_1st= (round == 1);
> +}
> +
> +bool Recovery_context::decide_or_assess(xid_recovery_member *member, int round,
> + Format_description_log_event *fdle,
> + LOG_INFO *linfo, my_off_t pos)
> +{
> + if (member)
> + {
> + DBUG_EXECUTE_IF("binlog_truncate_partial_commit",
> + if (last_gtid_engines == 2)
> + {
> + DBUG_ASSERT(member->in_engine_prepare > 0);
> + member->in_engine_prepare= 1;
> + });
> + /*
> + xid in doubt are resolved as follows:
> + in_engine_prepare is compared agaist binlogged info to
> + yield the commit-or-rollback decision in the normal case.
> + In the semisync-slave recovery the decision may be
> + approximate to change in later rounds.
> + */
> + if (member->in_engine_prepare > last_gtid_engines)
> + {
> + char buf[21];
> + longlong10_to_str(last_gtid.seq_no, buf, 10);
> + sql_print_error("Error to recovery multi-engine transaction: "
> + "the number of engines prepared %u exceeds the "
> + "respective number %u in its GTID %u-%u-%s "
> + "located at file:%s pos:%llu",
> + member->in_engine_prepare, last_gtid_engines,
> + last_gtid.domain_id, last_gtid.server_id, buf,
> + linfo->log_file_name, last_gtid_coord.second);
> + return true;
> + }
> + else if (member->in_engine_prepare < last_gtid_engines)
> + {
> + DBUG_EXECUTE_IF("binlog_truncate_partial_commit",
> + member->in_engine_prepare= 2;);
> + DBUG_ASSERT(member->in_engine_prepare > 0);
> + /*
> + This is an "unlikely" branch of two or more engines in transaction
> + that is partially committed, so to complete.
> + */
> + member->decided_to_commit= true;
> + if (do_truncate)
> + {
> + /*
> + A validated truncate pos may not change later.
> + Any "unlikely" (two or more engines in transaction) reset
> + to not-validated yet position ensues correcting early
> + estimates in the following round(s).
> + */
> + if (!truncate_validated)
> + reset_truncate_coord(pos);
> + }
> + }
> + else // member->in_engine_prepare == last_gtid_engines
> + {
> + if (!do_truncate) // "normal" recovery
> + {
> + member->decided_to_commit= true;
> + }
> + else
> + {
> + /*
> + The first time or further estimate the truncate position
> + until validation. Already set not validated yet postion
> + may change only through previous reset
> + unless this xid in doubt is the first in the 2nd round.
> + */
> + if (!truncate_validated)
> + {
> + DBUG_ASSERT(round <= 2);
> + /*
> + Either truncate was not set or was reset, else
> + it gets incremented, otherwise it may only set to an earlier
> + offset only at the turn out of the 1st round.
sorry, I cannot parse that :(
> + */
> + DBUG_ASSERT(truncate_gtid.seq_no == 0 ||
> + last_gtid_coord >= binlog_truncate_coord ||
> + (round == 2 && truncate_set_in_1st));
> +
> + last_gtid_valid= false; // may still (see gtid) flip later
> + if (truncate_gtid.seq_no == 0 /* was reset or never set */ ||
> + (truncate_set_in_1st && round == 2 /* reevaluted at round turn */))
> + set_truncate_coord(linfo, round, fdle->checksum_alg);
> +
> + DBUG_ASSERT(member->decided_to_commit == false); // may flip later
> + }
> + else
> + {
> + DBUG_ASSERT(!truncate_reset_done); // the position was settled
> + /*
> + Correct earlier decisions of 1st and/or 2nd round to
> + rollback and invalidate last_gtid in binlog state.
> + */
> + if (!(member->decided_to_commit=
> + last_gtid_coord < binlog_truncate_coord))
> + {
> + last_gtid_valid= false; // settled
> + if (truncate_gtid.seq_no == 0)
> + truncate_gtid= last_gtid;
> + }
> + }
> + }
> + }
> + }
> + else if (do_truncate) // "0" < last_gtid_engines
> + {
> + /*
> + Similar to the multi-engine partial commit of "then" branch above.
> + The 2nd condition economizes away an extra (3rd) round in
I don't think "economizes" is a word
> + expected cases of first xid:s in the binlog checkpoint file
> + are actually of committed transactions. So fully committed
> + xid sequence is passed in this branch without any action.
> + */
> + if (!truncate_validated)
> + reset_truncate_coord(pos);
> + }
> +
> + return false;
> +}
> +
> +/*
> + Is invoked when a standalone or non-2pc group is detected.
> + Both are unsafe to truncate in the semisync-slave recovery so
> + the maximum unsafe coordinate may be updated.
> + In the non-2pc group case though, *exeptionally*,
> + the no-engine group is considered safe, to be invalidated
> + to not contribute to binlog state.
> +*/
> +void Recovery_context::update_binlog_unsafe_coord_if_needed(LOG_INFO *linfo)
> +{
> + if (!do_truncate)
> + return;
> +
> + if (truncate_gtid.seq_no > 0 && // g1,U2, *not* G1,U2
> + last_gtid_coord > binlog_truncate_coord)
> + {
> + DBUG_ASSERT(binlog_truncate_coord.second > 0);
> + /*
> + Potentially unsafe when the truncate coordinate is not determined,
> + just detected as unsafe when behind the latter.
> + */
> + if (last_gtid_engines == 0)
> + {
> + last_gtid_valid= false;
> + }
> + else
> + {
> + binlog_unsafe_gtid= last_gtid;
> + binlog_unsafe_coord= last_gtid_coord;
> + strmake_buf(binlog_unsafe_file_name, linfo->log_file_name);
> + }
> + }
> +}
> +
> +/*
> + Assigns last_gtid and assesses the maximum (in the binlog offset term)
> + unsafe gtid (group of events).
> +*/
> +void Recovery_context::process_gtid(int round, Gtid_log_event *gev,
> + LOG_INFO *linfo)
> +{
> + last_gtid.domain_id= gev->domain_id;
> + last_gtid.server_id= gev->server_id;
> + last_gtid.seq_no= gev->seq_no;
> + last_gtid_engines= gev->extra_engines != UINT_MAX ?
> + gev->extra_engines + 1 : 0;
> + last_gtid_coord= std::pair<uint,my_off_t>(id_binlog, prev_event_pos);
> + if (round == 1 || do_truncate)
> + {
> + DBUG_ASSERT(!last_gtid_valid);
> +
> + last_gtid_no2pc= false;
> + last_gtid_standalone=
> + (gev->flags2 & Gtid_log_event::FL_STANDALONE) ? true : false;
> + if (do_truncate && last_gtid_standalone)
> + update_binlog_unsafe_coord_if_needed(linfo);
> + /* Update the binlog state with any 'valid' GTID logged after Gtid_list. */
> + last_gtid_valid= true; // may flip at Xid when falls to truncate
> + }
> +}
> +
> +/*
> + At the end of processing of the current binlog compute next action.
> + When round increments in the semisync-slave recovery
> + truncate_validated, truncate_reset_done
> + gets reset/set for the next round,
> + as well as id_binlog gets reset either to zero or UINT_MAX
> + when recovery deals with the only binlog file.
> +
> + @param[in,out] round the current round that may increment here
> + @param last_log_name the recovery starting binlog file
> + @param binlog_checkpoint_name
> + binlog checkpoint file
> + @param linfo binlog file list struct for next file
> + @param log pointer to mysql_bin_log instance
> + @return 0 when rounds continue, maybe the current one remains
> + 1 when all rounds are done
> + -1 on error
> +*/
> +int Recovery_context::next_binlog_or_round(int& round,
> + const char *last_log_name,
> + const char *binlog_checkpoint_name,
> + LOG_INFO *linfo,
> + MYSQL_BIN_LOG *log)
> +{
> + if (!strcmp(linfo->log_file_name, last_log_name))
> + {
> + /* Either exit the loop now, or increment round */
> + DBUG_ASSERT(round <= 3);
> +
> + if ((round <= 2 && likely(!truncate_reset_done)) || round == 3)
> + {
> + // Exit now as the 1st round ends up with the binlog-checkpoint file
> + // is the same as the initial binlog file, so already parsed; or
> + // the 2nd round has not made own truncate_reset_done; or
> + // at the end of the 3rd "truncate" round.
> + DBUG_ASSERT(do_truncate || round == 1);
> + // The 3rd round is done only when the 2nd trued truncate_reset_done
> + // and consequently truncate_validated.
> + // No instances of truncate_reset_done in first 2 rounds allows
> + // for exiting now.
our (unwritten) coding style says that multi-line comments should use /*...*/
> + DBUG_ASSERT(round < 3 || truncate_validated);
> +
> + return true;
> + }
> + else
> + {
> + DBUG_ASSERT(do_truncate);
> + /*
> + the last binlog file, having truncate_reset_done to indicate
> + needed correction to member->decided_to_commit to reflect
> + changed binlog_truncate_pos.
> + */
> + truncate_reset_done= false;
> + truncate_validated= true;
> + rpl_global_gtid_binlog_state.reset_nolock();
> +
> + if (round > 1 &&
> + log->find_log_pos(linfo, binlog_checkpoint_name, 1))
> + {
> + sql_print_error("Binlog file '%s' not found in binlog index, needed "
> + "for recovery. Aborting.", binlog_checkpoint_name);
> + return -1;
> + }
> + id_binlog= (single_binlog= !strcmp(linfo->log_file_name, last_log_name)) ?
> + UINT_MAX : 0;
> + round++;
> + }
> + }
> + else if (round == 1)
> + {
> + if (do_truncate)
> + {
> + truncate_validated= truncate_reset_done;
> + rpl_global_gtid_binlog_state.reset_nolock();
> + truncate_reset_done= false;
> + id_binlog= 0;
> +
> + DBUG_ASSERT(!single_binlog);
> + }
> + round++;
> + }
> + else
> + {
> + /*
> + NOTE: reading other binlog's FD is necessary for finding out
> + the checksum status of the respective binlog file.
> + Round remains in this branch.
> + */
> + if (log->find_next_log(linfo, 1))
> + {
> + sql_print_error("Error reading binlog files during recovery. "
> + "Aborting.");
> + return -1;
> + }
> + if (!strcmp(linfo->log_file_name, last_log_name))
> + {
> + if (do_truncate)
> + {
> + DBUG_ASSERT(!single_binlog);
> + id_binlog= UINT_MAX;
> + }
> + else
> + return 1; // regular recovery exit
> + }
> + else if (do_truncate)
> + id_binlog++;
> +
> + DBUG_ASSERT(id_binlog <= UINT_MAX); // the assert is "practical"
> + }
> +
> + DBUG_ASSERT(!single_binlog ||
> + !strcmp(linfo->log_file_name, binlog_checkpoint_name));
> +
> + return 0;
> +}
> +#endif
>
> int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> IO_CACHE *first_log,
> - Format_description_log_event *fdle, bool do_xa)
> + Format_description_log_event *fdle_arg, bool do_xa)
> {
> Log_event *ev= NULL;
> HASH xids;
> MEM_ROOT mem_root;
> char binlog_checkpoint_name[FN_REFLEN];
> bool binlog_checkpoint_found;
> - bool first_round;
> IO_CACHE log;
> File file= -1;
> const char *errmsg;
> + Format_description_log_event *fdle= fdle_arg;
> #ifdef HAVE_REPLICATION
> - rpl_gtid last_gtid;
> - bool last_gtid_standalone= false;
> - bool last_gtid_valid= false;
> + Recovery_context ctx;
> #endif
> + /*
> + The for-loop variable is updated by the following rule set:
> + Initially set to 1.
> + After the initial binlog file is processed to identify
> + the Binlog-checkpoint file it is incremented when the latter file
> + is different from the initial one. Otherwise the only log has been
> + fully parsed and the loop exits.
> + The 2nd starts with the Binlog-checkpoint file and ends when the initial
> + binlog file is reached. It is excluded from yet another processing
> + in the "normal" non-semisync-slave configuration, and then the loop is
> + done at this point. Otherwise in the semisync slave case it may be parsed
> + over again. The 2nd round may turn to a third in "unlikely" condition of the
> + semisync-slave is being recovered having multi- or different engine
> + transactions in doubt.
> + */
I don't understand that. Why semisync-slave case is special?
> + int round;
>
> if (! fdle->is_valid() ||
> (do_xa && my_hash_init(&xids, &my_charset_bin, TC_LOG_PAGE_SIZE/3, 0,
> @@ -10072,39 +10744,63 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
>
> fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error
>
> + /* finds xids when root is not NULL */
> + if (do_xa && ha_recover(&xids, &mem_root))
> + goto err1;
> +
> /*
> Scan the binlog for XIDs that need to be committed if still in the
> prepared stage.
>
> Start with the latest binlog file, then continue with any other binlog
> files if the last found binlog checkpoint indicates it is needed.
> +
> + In case the semisync slave recovery the 2nd round may include
> + the initial file as well as cause a 3rd round when transactions
> + with multiple engines are discovered.
> + Additionally to find and decide on transactions in doubt that
> + the semisync slave may need to roll back, the binlog can be truncated
> + in the end to reflect the rolled back decisions.
> */
>
> binlog_checkpoint_found= false;
> - first_round= true;
> - for (;;)
> + for (round= 1;;)
> {
> - while ((ev= Log_event::read_log_event(first_round ? first_log : &log,
> + while ((ev= Log_event::read_log_event(round == 1 ? first_log : &log,
> fdle, opt_master_verify_checksum))
> && ev->is_valid())
> {
> enum Log_event_type typ= ev->get_type_code();
> switch (typ)
> {
> + case FORMAT_DESCRIPTION_EVENT:
> + if (round > 1)
> + {
> + if (fdle != fdle_arg)
> + delete fdle;
> + fdle= (Format_description_log_event*) ev;
why is it needed? all binlog files should have the same
Format_description_log_event anyway.
> + }
> + break;
> case XID_EVENT:
> + if (do_xa)
> {
> - if (do_xa)
> + xid_recovery_member *member=
> + (xid_recovery_member*)
> + my_hash_search(&xids, (uchar*) &static_cast<Xid_log_event*>(ev)->xid,
> + sizeof(my_xid));
> +#ifndef HAVE_REPLICATION
> {
> - Xid_log_event *xev=(Xid_log_event *)ev;
> - uchar *x= (uchar *) memdup_root(&mem_root, (uchar*) &xev->xid,
> - sizeof(xev->xid));
> - if (!x || my_hash_insert(&xids, x))
> - goto err2;
> + if (member)
> + member->decided_to_commit= true;
> }
> - break;
> +#else
> + if (ctx.decide_or_assess(member, round, fdle, linfo, ev->log_pos))
> + goto err2;
> +#endif
> }
> + break;
> case BINLOG_CHECKPOINT_EVENT:
> - if (first_round && do_xa)
> + if (round == 1 && do_xa)
> {
> size_t dir_len;
> Binlog_checkpoint_log_event *cev= (Binlog_checkpoint_log_event *)ev;
> @@ -10124,9 +10820,18 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> }
> }
> break;
> +#ifdef HAVE_REPLICATION
> case GTID_LIST_EVENT:
> - if (first_round)
> + if (round == 1 || (ctx.do_truncate &&
> + (ctx.id_binlog == 0 || ctx.single_binlog)))
> {
> + /*
> + Unlike the normal case, in do_truncate the initial state is
> + in the first binlog file of the recovery list.
> + */
don't understand that either :(
> + DBUG_ASSERT(!ctx.do_truncate || !ctx.single_binlog ||
> + ctx.id_binlog == UINT_MAX);
> +
> Gtid_list_log_event *glev= (Gtid_list_log_event *)ev;
>
> /* Initialise the binlog state from the Gtid_list event. */
> @@ -10135,19 +10840,16 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> }
> break;
>
> -#ifdef HAVE_REPLICATION
> case GTID_EVENT:
> - if (first_round)
> + ctx.process_gtid(round, (Gtid_log_event *)ev, linfo);
> + break;
> +
> + case QUERY_EVENT:
> + if (((Query_log_event *)ev)->is_commit() ||
> + ((Query_log_event *)ev)->is_rollback())
> {
> - Gtid_log_event *gev= (Gtid_log_event *)ev;
> -
> - /* Update the binlog state with any GTID logged after Gtid_list. */
> - last_gtid.domain_id= gev->domain_id;
> - last_gtid.server_id= gev->server_id;
> - last_gtid.seq_no= gev->seq_no;
> - last_gtid_standalone=
> - ((gev->flags2 & Gtid_log_event::FL_STANDALONE) ? true : false);
> - last_gtid_valid= true;
> + ctx.last_gtid_no2pc= true;
> + ctx.update_binlog_unsafe_coord_if_needed(linfo);
what about DML on non-transactonal tables? it won't have COMMIT/ROLLBACK will it?
> }
> break;
> #endif
> @@ -10217,35 +10923,43 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> file= -1;
> }
>
> - if (!strcmp(linfo->log_file_name, last_log_name))
> - break; // No more files to do
> +#ifdef HAVE_REPLICATION
> + int rc= ctx.next_binlog_or_round(round, last_log_name,
> + binlog_checkpoint_name, linfo, this);
> + if (rc == -1)
> + goto err2;
> + else if (rc == 1)
> + break; // all rounds done
> +#else
> + if (!strcmp(linfo->log_file_name, last_log_name))
> + break; // No more files to do
indentation
> +#endif
> +
> if ((file= open_binlog(&log, linfo->log_file_name, &errmsg)) < 0)
> {
> sql_print_error("%s", errmsg);
> goto err2;
> }
> - /*
> - We do not need to read the Format_description_log_event of other binlog
> - files. It is not possible for a binlog checkpoint to span multiple
> - binlog files written by different versions of the server. So we can use
> - the first one read for reading from all binlog files.
> - */
> - if (find_next_log(linfo, 1))
> - {
> - sql_print_error("Error reading binlog files during recovery. Aborting.");
> - goto err2;
> - }
> fdle->reset_crypto();
> - }
> + } // end of for
>
> if (do_xa)
> {
> - if (ha_recover(&xids))
> - goto err2;
> -
> + if (binlog_checkpoint_found)
> + {
> +#ifndef HAVE_REPLICATION
> + if (ha_recover_complete(&xids))
> +#else
> + if (ctx.complete(this, xids))
> +#endif
> + goto err2;
> + }
> free_root(&mem_root, MYF(0));
> my_hash_free(&xids);
> }
> + if (fdle != fdle_arg)
> + delete fdle;
> +
> return 0;
>
> err2:
Regards,
Sergei
VP of MariaDB Server Engineering
and security(a)mariadb.org
1
0
[Maria-developers] Systems Integration and Raising of the Abstraction Level
by Alejandro Sánchez 25 Feb '21
by Alejandro Sánchez 25 Feb '21
25 Feb '21
We have highly evolved systems such as SQL, HTTP, HTML, file formats or
high level programming languages such as Java or PHP that allow us to
program many things with little code. Even so a lot of effort is
invested in the integration of these systems. To try to reduce this
problem libraries and frameworks that help in some ways are created but
the integration is not complete.
It is well known that most of the time when you try to create something
to integrate several incompatible systems what you get in the end is to
have another incompatible system :). Still I think the integration
between the systems mentioned above is something very important that
can mean a great step in the evolution of computing and worth a try.
To explore how this integration can be I have created a framework that
I have called NextTypes and that its main objective is the integration
of data types. For me it is something very illogical that something as
basic as a 16 bits integer receives a different name in each of the
systems ("smallint", "short", "number") and can also be signed or
unsigned. In any moment, due to a mistake from the programmer, the
number of the programming language does not fit in the database column.
Besides these names are little indicative of its characteristics, it
would be clearer for example to use "int16". Whatever names are chosen
the most important thing is to use in all systems the same names for
types of the same characteristics.
Also there is no standard system for defining composite data types of
primitive types and other composite types. From an HTML form to a SQL
table or from one application to another are required multiple
transformations of the data. Lack of integration also lowers the level
of abstraction, making it necessary to do lots of low level stuff for
systems to fit.
NextTypes at this moment is nothing more than another incompatible
system with the others. It simply integrates them quite a bit and
raises the level of abstraction. But what I would like is that the
things that compose NextTypes were included in the systems it
integrates.
Finally I would like to list some examples of improvements in database
managers, SQL, HTTP, HTML, browsers and programming languages that
would help the integration and elevation of the level of abstraction.
Some of these enhancements are already included in NextTypes and other
frameworks.
SQL
---
- Custom metadata in tables and columns.
- Date of creation and modification of the rows.
- Date of creation and modification of the definition of the tables.
- Use of table and column names in prepared statements.
Example: select # from # where # = ?;
- Use of arrays in prepared statements.
Example: select # from article where id in (?);
# = author,title
? = 10,24,45
- Standardization of ranges of valid values and resolution for date,
time and datetime types in database managers an HTML time element.
PostgreSQL
----------
- Facilitate access to the definition of full text search indexes with
a function to parse "pg_index.indexprs" column.
Other database managers
-----------------------
- Allow transactional DDL, deferrable constraints and composite types.
JDBC
----
- High level methods that allow queries with the execution of a
single method.
Example: Tuple [] tuples = query("select author,title from article
where id in (?), ids);
- Integration with java.time data types.
HTTP - Servers
--------------
- Processing of arrays of elements composed of several parameters.
fields:0:type = string
fields:0:name = title
fields:0:parameters = 250
fields:0:not_null = true
fields:1:type = string
fields:1:name = author
fields:1:parameters = 250
fields:1:not_null = true
Another possibility is to generate in the browser arrays of JSON
objects from the forms.
"fields": [
{
"type": "string",
"name": "title",
"parameters": 250,
"not_null": true
}, {
"type": "string",
"name": "author",
"parameters": 250,
"not_null": true
}
]
XML/HTML - BROWSER
------------------
- Input elements for different types of numbers with min and max
values: 16 bits integer, 32 bits integer, 32 bits float and 64 bits
float.
- Input elements for images, audios and videos with preview.
- Timezone input element.
- Boolean input element with "true" and "false" values.
- Null value in file inputs.
- Clear button in file inputs like in date and time inputs.
- Show size in file inputs.
- Extension of the DOM API with high level and chainable methods.
Example: paragraph.appendElement("a").setAttribute("href",
"/article");
- Change of the "action" parameter of the forms to "target" to indicate
the URL where to execute the action. The "action" parameter is moved to
the different buttons on the form and allows executing a different
action with each of the the buttons.
Example:
<form target="/article">
<button action="delete">Delete</button>
<button action="export">Export</button>
</form>
- "select" elements that change a parameter of the current URL.
Example:
<select url-parameter="lang"/>
<option>en</option>
<option>es</option>
URL = https://demo.nexttypes.com/?lang=en
- "content-type" attribute in links and context menu in the browser to
open links with external applications using WEBDAV, similar to a file
manager.
Example: <a href="" content-
type="application/vnd.oasis.opendocument.text">
------------------------------
| Open link with ... |
| Open link with LibreOffice |
- Background submission of forms without using XMLHttpRequest,
display of result in dialog window or file download, and execution of a
Javascript function for subsequent actions.
Example: <form background show-progress callback=function()>
- Dialog with progress indicator of form submission. Must show total
size, transmitted data and average speed. Possibility to pause or
cancel the submission.
- Dynamic datalist with searchable JSON source. Over data source URL is
added a "search" parameter with input value.
Example:
<input list="article-list" name="article" type="text" />
<datalist id="article-list"
src="/article?lang=en&view=json&names" />
Example query URL: "/article?lang=en&view=json&names&search=Ne"
- Same appearance and operation of inputs with dynamic datalist in all
browsers.
- Option tags with icons.
Example: <option icon="/icons/save.svg">Save</option>
- Tabs, Tree, etc widgets
- Mechanism to close HTTPS sessions initiated with client certificate
authentication.
JAVA
----
- Subclasses of String or some system of variants of String that allows
assigning a regular expression or class that limits its valid values
to avoid code injection or values that will crash the system.
Example: String:Type = "[a-z0-9 _] +"; Or String:Type = TypeChecks;
String:Type type = "article_language"; -> correct
String:Type type = "Article-Language"; -> error
We can talk about the characteristics of each system in its mailing
list or group. For the general topic of system integration I have
created a discussion in the github project.
https://github.com/alejsanc/nexttypes/discussions/6
This email has been sent to the following mailing lists and groups:
pgsql-hackers(a)lists.postgresql.org
pgsql-jdbc(a)lists.postgresql.org
mozilla.dev.platform(a)googlegroups.com
public-html(a)w3.org
jdbc-spec-discuss(a)openjdk.java.net
jdk-dev(a)openjdk.java.net
maria-developers(a)lists.launchpad.net
chromium-dev(a)chromium.org
https://mysqlcommunity.slack.com/archives/C8R1336M7
Best regards.
Alejandro Sánchez.
1
0