Kristian, salute.
Hi Andrei!
Can you review this patch?
Thanks for taking care of this critical and complicated issue! I studied the patch to have understand its idea of basically implementing a roll-back to the slave gtid state as the slave transaction participant resource. This would be really a way to go unless you will dismiss the following tip. Why won't we defer the current eager/optimistic old sub-id records discard in rpl_slave_state::record_gtid() mysql_mutex_lock(&LOCK_slave_state); if ((elem= get_element(gtid->domain_id)) == NULL) { mysql_mutex_unlock(&LOCK_slave_state); my_error(ER_OUT_OF_RESOURCES, MYF(0)); err= 1; goto end; } if ((elist= elem->grab_list()) != NULL) { /* Delete any old stuff, but keep around the most recent one. */ uint64 best_sub_id= elist->sub_id; list_element **best_ptr_ptr= &elist; cur= elist; while ((next= cur->next)) { if (next->sub_id > best_sub_id) { best_sub_id= next->sub_id; best_ptr_ptr= &cur->next; } cur= next; } /* Delete the highest sub_id element from the old list, and put it back as the single-element new list. */ cur= *best_ptr_ptr; *best_ptr_ptr= cur->next; cur->next= NULL; elem->list= cur; } mysql_mutex_unlock(&LOCK_slave_state); till we know the transaction will be committed. E.g: Xid_log_event::do_apply_event rpl_slave_state::update_state_hash rpl_slave_state::update the stack bottom looks appropriate place to me, but I may miss some background or negative use cases. Doing it this way makes the slave gtid state to be updated right after gtid_slave_pos insert+delete records are committed, while currently it's in the opposite order. That is neither one provides atomicity of updating the two objects. Should not really matter, right? Cheers, Andrei
This fixes a rather serious bug in parallel replication that was reported on maria-discuss@ here: https://lists.launchpad.net/maria-discuss/msg05253.html
The user ended up with millions of rows in mysql.gtid_slave_pos, which is quite bad.
I think this should go in 10.1. The bug is probably also present in 10.0, but it will be much harder to trigger since 10.0 does not have optimistic parallel replication. So leaving it unfixed in 10.0 seems ok.
I plan to manually merge it to 10.2 and 10.3 as well, since the merge to 10.3 is not entirely trivial, due to interaction with @@gtid_pos_auto_engines.
Github branch here:
https://github.com/knielsen/server/commits/gtid_table_garbage_rows
- Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
revision-id: 2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0 (mariadb-10.1.35-71-g2f4a0c5be2c) parent(s): 1fc5a6f30c3a9c047dcf9a36b00026d98f286f6b author: Kristian Nielsen committer: Kristian Nielsen timestamp: 2018-10-07 18:59:52 +0200 message:
Fix accumulation of old rows in mysql.gtid_slave_pos
This would happen especially in optimistic parallel replication, where there is a good chance that a transaction will be rolled back (due to conflicts) after it has executed record_gtid(). If the transaction did any deletions of old rows as part of record_gtid(), those deletions will be undone as well. And the code did not properly ensure that the deletions would be re-tried.
This patch makes record_gtid() remember the list of deletions done as part of a transaction. Then in rpl_slave_state::update() when the changes have been committed, we discard the list. However, in case of error and rollback, in cleanup_context() we will instead put the list back into rpl_global_gtid_slave_state so that the deletions will be re-tried later.
Probably fixes part of the cause of MDEV-12147 as well.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
--- .../suite/rpl/r/rpl_parallel_optimistic.result | 6 ++ .../suite/rpl/t/rpl_parallel_optimistic.test | 17 +++++ sql/log_event.cc | 6 +- sql/rpl_gtid.cc | 64 ++++++++++++------ sql/rpl_gtid.h | 2 +- sql/rpl_rli.cc | 79 ++++++++++++++++++++++ sql/rpl_rli.h | 11 +++ 7 files changed, 161 insertions(+), 24 deletions(-)
diff --git a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result index 3cd4f8231bf..99bd8562ffe 100644 --- a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result +++ b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result @@ -571,4 +571,10 @@ SET GLOBAL slave_parallel_mode=@old_parallel_mode; SET GLOBAL slave_parallel_threads=@old_parallel_threads; include/start_slave.inc DROP TABLE t1, t2, t3; +include/save_master_gtid.inc +include/sync_with_master_gtid.inc +Check that no more than the expected last two GTIDs are in mysql.gtid_slave_pos +select count(*) from mysql.gtid_slave_pos order by domain_id, sub_id; +count(*) +2 include/rpl_end.inc diff --git a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test index 9f6669279db..3867a3fdf3a 100644 --- a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test +++ b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test @@ -549,5 +549,22 @@ SET GLOBAL slave_parallel_threads=@old_parallel_threads;
--connection server_1 DROP TABLE t1, t2, t3; +--source include/save_master_gtid.inc + +--connection server_2 +--source include/sync_with_master_gtid.inc +# Check for left-over rows in table mysql.gtid_slave_pos (MDEV-12147). +# +# There was a bug when a transaction got a conflict and was rolled back. It +# might have also handled deletion of some old rows, and these deletions would +# then also be rolled back. And since the deletes were never re-tried, old no +# longer needed rows would accumulate in the table without limit. +# +# The earlier part of this test file have plenty of transactions being rolled +# back. But the last DROP TABLE statement runs on its own and should never +# conflict, thus at this point the mysql.gtid_slave_pos table should be clean. +--echo Check that no more than the expected last two GTIDs are in mysql.gtid_slave_pos +select count(*) from mysql.gtid_slave_pos order by domain_id, sub_id;
+--connection server_1 --source include/rpl_end.inc diff --git a/sql/log_event.cc b/sql/log_event.cc index e1912ad4620..e07b7002398 100644 --- a/sql/log_event.cc +++ b/sql/log_event.cc @@ -4429,7 +4429,7 @@ int Query_log_event::do_apply_event(rpl_group_info *rgi,
gtid= rgi->current_gtid; if (rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, - true, false)) + rgi, false)) { int errcode= thd->get_stmt_da()->sql_errno(); if (!is_parallel_retry_error(rgi, errcode)) @@ -7132,7 +7132,7 @@ Gtid_list_log_event::do_apply_event(rpl_group_info *rgi) { if ((ret= rpl_global_gtid_slave_state->record_gtid(thd, &list[i], sub_id_list[i], - false, false))) + NULL, false))) return ret; rpl_global_gtid_slave_state->update_state_hash(sub_id_list[i], &list[i], NULL); @@ -7639,7 +7639,7 @@ int Xid_log_event::do_apply_event(rpl_group_info *rgi) rgi->gtid_pending= false;
gtid= rgi->current_gtid; - err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, true, + err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, rgi, false); if (err) { diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc index 7b1acf17ef5..94944b5b3e5 100644 --- a/sql/rpl_gtid.cc +++ b/sql/rpl_gtid.cc @@ -77,7 +77,7 @@ rpl_slave_state::record_and_update_gtid(THD *thd, rpl_group_info *rgi) rgi->gtid_pending= false; if (rgi->gtid_ignore_duplicate_state!=rpl_group_info::GTID_DUPLICATE_IGNORE) { - if (record_gtid(thd, &rgi->current_gtid, sub_id, false, false)) + if (record_gtid(thd, &rgi->current_gtid, sub_id, NULL, false)) DBUG_RETURN(1); update_state_hash(sub_id, &rgi->current_gtid, rgi); } @@ -328,6 +328,8 @@ rpl_slave_state::update(uint32 domain_id, uint32 server_id, uint64 sub_id, } } rgi->gtid_ignore_duplicate_state= rpl_group_info::GTID_DUPLICATE_NULL; + + rgi->pending_gtid_deletes_clear(); }
if (!(list_elem= (list_element *)my_malloc(sizeof(*list_elem), MYF(MY_WME)))) @@ -377,15 +379,24 @@ int rpl_slave_state::put_back_list(uint32 domain_id, list_element *list) { element *e; + int err= 0; + + mysql_mutex_lock(&LOCK_slave_state); if (!(e= (element *)my_hash_search(&hash, (const uchar *)&domain_id, 0))) - return 1; + { + err= 1; + goto end; + } while (list) { list_element *next= list->next; e->add(list); list= next; } - return 0; + +end: + mysql_mutex_unlock(&LOCK_slave_state); + return err; }
@@ -468,12 +479,12 @@ gtid_check_rpl_slave_state_table(TABLE *table) /* Write a gtid to the replication slave state table.
- Do it as part of the transaction, to get slave crash safety, or as a separate - transaction if !in_transaction (eg. MyISAM or DDL). - gtid The global transaction id for this event group. sub_id Value allocated within the sub_id when the event group was read (sub_id must be consistent with commit order in master binlog). + rgi rpl_group_info context, if we are recording the gtid transactionally + as part of replicating a transactional event. NULL if called from + outside of a replicated transaction.
Note that caller must later ensure that the new gtid and sub_id is inserted into the appropriate HASH element with rpl_slave_state.add(), so that it can @@ -481,13 +492,13 @@ gtid_check_rpl_slave_state_table(TABLE *table) */ int rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, - bool in_transaction, bool in_statement) + rpl_group_info *rgi, bool in_statement) { TABLE_LIST tlist; int err= 0; bool table_opened= false; TABLE *table; - list_element *elist= 0, *next; + list_element *elist= 0, *cur, *next; element *elem; ulonglong thd_saved_option= thd->variables.option_bits; Query_tables_list lex_backup; @@ -558,7 +569,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, thd->wsrep_ignore_table= true; #endif
- if (!in_transaction) + if (!rgi) { DBUG_PRINT("info", ("resetting OPTION_BEGIN")); thd->variables.option_bits&= @@ -601,9 +612,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, if ((elist= elem->grab_list()) != NULL) { /* Delete any old stuff, but keep around the most recent one. */ - list_element *cur= elist; - uint64 best_sub_id= cur->sub_id; + uint64 best_sub_id= elist->sub_id; list_element **best_ptr_ptr= &elist; + cur= elist; while ((next= cur->next)) { if (next->sub_id > best_sub_id) @@ -636,7 +647,8 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, table->file->print_error(err, MYF(0)); goto end; } - while (elist) + cur = elist; + while (cur) { uchar key_buffer[4+8];
@@ -646,9 +658,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, /* `break' does not work inside DBUG_EXECUTE_IF */ goto dbug_break; });
- next= elist->next; + next= cur->next;
- table->field[1]->store(elist->sub_id, true); + table->field[1]->store(cur->sub_id, true); /* domain_id is already set in table->record[0] from write_row() above. */ key_copy(key_buffer, table->record[0], &table->key_info[0], 0, false); if (table->file->ha_index_read_map(table->record[1], key_buffer, @@ -662,8 +674,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, not want to endlessly error on the same element in case of table corruption or such. */ - my_free(elist); - elist= next; + cur= next; if (err) break; } @@ -686,18 +697,31 @@ IF_DBUG(dbug_break:, ) */ if (elist) { - mysql_mutex_lock(&LOCK_slave_state); put_back_list(gtid->domain_id, elist); - mysql_mutex_unlock(&LOCK_slave_state); + elist = 0; }
ha_rollback_trans(thd, FALSE); } close_thread_tables(thd); - if (in_transaction) + if (rgi) + { thd->mdl_context.release_statement_locks(); + /* + Save the list of old gtid entries we deleted. If this transaction + fails later for some reason and is rolled back, the deletion of those + entries will be rolled back as well, and we will need to put them back + on the to-be-deleted list so we can re-do the deletion. Otherwise + redundant rows in mysql.gtid_slave_pos may accumulate if transactions + are rolled back and retried after record_gtid(). + */ + rgi->pending_gtid_deletes_save(gtid->domain_id, elist); + } else + { thd->mdl_context.release_transactional_locks(); + rpl_group_info::pending_gtid_deletes_free(elist); + } } thd->lex->restore_backup_query_tables_list(&lex_backup); thd->variables.option_bits= thd_saved_option; @@ -1080,7 +1104,7 @@ rpl_slave_state::load(THD *thd, char *state_from_master, size_t len,
if (gtid_parser_helper(&state_from_master, end, >id) || !(sub_id= next_sub_id(gtid.domain_id)) || - record_gtid(thd, >id, sub_id, false, in_statement) || + record_gtid(thd, >id, sub_id, NULL, in_statement) || update(gtid.domain_id, gtid.server_id, sub_id, gtid.seq_no, NULL)) return 1; if (state_from_master == end) diff --git a/sql/rpl_gtid.h b/sql/rpl_gtid.h index 79d566bddbf..7bd639b768f 100644 --- a/sql/rpl_gtid.h +++ b/sql/rpl_gtid.h @@ -182,7 +182,7 @@ struct rpl_slave_state uint64 seq_no, rpl_group_info *rgi); int truncate_state_table(THD *thd); int record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, - bool in_transaction, bool in_statement); + rpl_group_info *rgi, bool in_statement); uint64 next_sub_id(uint32 domain_id); int iterate(int (*cb)(rpl_gtid *, void *), void *data, rpl_gtid *extra_gtids, uint32 num_extra, diff --git a/sql/rpl_rli.cc b/sql/rpl_rli.cc index 64a1b535307..b35130c1505 100644 --- a/sql/rpl_rli.cc +++ b/sql/rpl_rli.cc @@ -1680,6 +1680,7 @@ rpl_group_info::reinit(Relay_log_info *rli) long_find_row_note_printed= false; did_mark_start_commit= false; gtid_ev_flags2= 0; + pending_gtid_delete_list= NULL; last_master_timestamp = 0; gtid_ignore_duplicate_state= GTID_DUPLICATE_NULL; speculation= SPECULATE_NO; @@ -1804,6 +1805,12 @@ void rpl_group_info::cleanup_context(THD *thd, bool error) erroneously update the GTID position. */ gtid_pending= false; + + /* + Rollback will have undone any deletions of old rows we might have made + in mysql.gtid_slave_pos. Put those rows back on the list to be deleted. + */ + pending_gtid_deletes_put_back(); } m_table_map.clear_tables(); slave_close_thread_tables(thd); @@ -2027,6 +2034,78 @@ rpl_group_info::unmark_start_commit() }
+/* + When record_gtid() has deleted any old rows from the table + mysql.gtid_slave_pos as part of a replicated transaction, save the list of + rows deleted here. + + If later the transaction fails (eg. optimistic parallel replication), the + deletes will be undone when the transaction is rolled back. Then we can + put back the list of rows into the rpl_global_gtid_slave_state, so that + we can re-do the deletes and avoid accumulating old rows in the table. +*/ +void +rpl_group_info::pending_gtid_deletes_save(uint32 domain_id, + rpl_slave_state::list_element *list) +{ + /* + We should never get to a state where we try to save a new pending list of + gtid deletes while we still have an old one. But make sure we handle it + anyway just in case, so we avoid leaving stray entries in the + mysql.gtid_slave_pos table. + */ + DBUG_ASSERT(!pending_gtid_delete_list); + if (unlikely(pending_gtid_delete_list)) + pending_gtid_deletes_put_back(); + + pending_gtid_delete_list= list; + pending_gtid_delete_list_domain= domain_id; +} + + +/* + Take the list recorded by pending_gtid_deletes_save() and put it back into + rpl_global_gtid_slave_state. This is needed if deletion of the rows was + rolled back due to transaction failure. +*/ +void +rpl_group_info::pending_gtid_deletes_put_back() +{ + if (pending_gtid_delete_list) + { + rpl_global_gtid_slave_state->put_back_list(pending_gtid_delete_list_domain, + pending_gtid_delete_list); + pending_gtid_delete_list= NULL; + } +} + + +/* + Free the list recorded by pending_gtid_deletes_save(). Done when the deletes + in the list have been permanently committed. +*/ +void +rpl_group_info::pending_gtid_deletes_clear() +{ + pending_gtid_deletes_free(pending_gtid_delete_list); + pending_gtid_delete_list= NULL; +} + + +void +rpl_group_info::pending_gtid_deletes_free(rpl_slave_state::list_element *list) +{ + rpl_slave_state::list_element *next; + + while (list) + { + next= list->next; + my_free(list); + list= next; + } +} + + rpl_sql_thread_info::rpl_sql_thread_info(Rpl_filter *filter) : rpl_filter(filter) { diff --git a/sql/rpl_rli.h b/sql/rpl_rli.h index 74d5b6fe416..b40a34a54e6 100644 --- a/sql/rpl_rli.h +++ b/sql/rpl_rli.h @@ -676,6 +676,11 @@ struct rpl_group_info /* Needs room for "Gtid D-S-N\x00". */ char gtid_info_buf[5+10+1+10+1+20+1];
+ /* List of not yet committed deletions in mysql.gtid_slave_pos. */ + rpl_slave_state::list_element *pending_gtid_delete_list; + /* Domain associated with pending_gtid_delete_list. */ + uint32 pending_gtid_delete_list_domain; + /* The timestamp, from the master, of the commit event. Used to do delayed update of rli->last_master_timestamp, for getting @@ -817,6 +822,12 @@ struct rpl_group_info char *gtid_info(); void unmark_start_commit();
+ static void pending_gtid_deletes_free(rpl_slave_state::list_element *list); + void pending_gtid_deletes_save(uint32 domain_id, + rpl_slave_state::list_element *list); + void pending_gtid_deletes_put_back(); + void pending_gtid_deletes_clear(); + time_t get_row_stmt_start_timestamp() { return row_stmt_start_timestamp; _______________________________________________ commits mailing list commits@mariadb.org https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits