Re: [Maria-developers] [Commits] 2f4a0c5be2c: Fix accumulation of old rows in mysql.gtid_slave_pos
Hi Andrei! Can you review this patch? This fixes a rather serious bug in parallel replication that was reported on maria-discuss@ here: https://lists.launchpad.net/maria-discuss/msg05253.html The user ended up with millions of rows in mysql.gtid_slave_pos, which is quite bad. I think this should go in 10.1. The bug is probably also present in 10.0, but it will be much harder to trigger since 10.0 does not have optimistic parallel replication. So leaving it unfixed in 10.0 seems ok. I plan to manually merge it to 10.2 and 10.3 as well, since the merge to 10.3 is not entirely trivial, due to interaction with @@gtid_pos_auto_engines. Github branch here: https://github.com/knielsen/server/commits/gtid_table_garbage_rows - Kristian. Kristian Nielsen <knielsen@knielsen-hq.org> writes:
revision-id: 2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0 (mariadb-10.1.35-71-g2f4a0c5be2c) parent(s): 1fc5a6f30c3a9c047dcf9a36b00026d98f286f6b author: Kristian Nielsen committer: Kristian Nielsen timestamp: 2018-10-07 18:59:52 +0200 message:
Fix accumulation of old rows in mysql.gtid_slave_pos
This would happen especially in optimistic parallel replication, where there is a good chance that a transaction will be rolled back (due to conflicts) after it has executed record_gtid(). If the transaction did any deletions of old rows as part of record_gtid(), those deletions will be undone as well. And the code did not properly ensure that the deletions would be re-tried.
This patch makes record_gtid() remember the list of deletions done as part of a transaction. Then in rpl_slave_state::update() when the changes have been committed, we discard the list. However, in case of error and rollback, in cleanup_context() we will instead put the list back into rpl_global_gtid_slave_state so that the deletions will be re-tried later.
Probably fixes part of the cause of MDEV-12147 as well.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
--- .../suite/rpl/r/rpl_parallel_optimistic.result | 6 ++ .../suite/rpl/t/rpl_parallel_optimistic.test | 17 +++++ sql/log_event.cc | 6 +- sql/rpl_gtid.cc | 64 ++++++++++++------ sql/rpl_gtid.h | 2 +- sql/rpl_rli.cc | 79 ++++++++++++++++++++++ sql/rpl_rli.h | 11 +++ 7 files changed, 161 insertions(+), 24 deletions(-)
diff --git a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result index 3cd4f8231bf..99bd8562ffe 100644 --- a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result +++ b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result @@ -571,4 +571,10 @@ SET GLOBAL slave_parallel_mode=@old_parallel_mode; SET GLOBAL slave_parallel_threads=@old_parallel_threads; include/start_slave.inc DROP TABLE t1, t2, t3; +include/save_master_gtid.inc +include/sync_with_master_gtid.inc +Check that no more than the expected last two GTIDs are in mysql.gtid_slave_pos +select count(*) from mysql.gtid_slave_pos order by domain_id, sub_id; +count(*) +2 include/rpl_end.inc diff --git a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test index 9f6669279db..3867a3fdf3a 100644 --- a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test +++ b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test @@ -549,5 +549,22 @@ SET GLOBAL slave_parallel_threads=@old_parallel_threads;
--connection server_1 DROP TABLE t1, t2, t3; +--source include/save_master_gtid.inc + +--connection server_2 +--source include/sync_with_master_gtid.inc +# Check for left-over rows in table mysql.gtid_slave_pos (MDEV-12147). +# +# There was a bug when a transaction got a conflict and was rolled back. It +# might have also handled deletion of some old rows, and these deletions would +# then also be rolled back. And since the deletes were never re-tried, old no +# longer needed rows would accumulate in the table without limit. +# +# The earlier part of this test file have plenty of transactions being rolled +# back. But the last DROP TABLE statement runs on its own and should never +# conflict, thus at this point the mysql.gtid_slave_pos table should be clean. +--echo Check that no more than the expected last two GTIDs are in mysql.gtid_slave_pos +select count(*) from mysql.gtid_slave_pos order by domain_id, sub_id;
+--connection server_1 --source include/rpl_end.inc diff --git a/sql/log_event.cc b/sql/log_event.cc index e1912ad4620..e07b7002398 100644 --- a/sql/log_event.cc +++ b/sql/log_event.cc @@ -4429,7 +4429,7 @@ int Query_log_event::do_apply_event(rpl_group_info *rgi,
gtid= rgi->current_gtid; if (rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, - true, false)) + rgi, false)) { int errcode= thd->get_stmt_da()->sql_errno(); if (!is_parallel_retry_error(rgi, errcode)) @@ -7132,7 +7132,7 @@ Gtid_list_log_event::do_apply_event(rpl_group_info *rgi) { if ((ret= rpl_global_gtid_slave_state->record_gtid(thd, &list[i], sub_id_list[i], - false, false))) + NULL, false))) return ret; rpl_global_gtid_slave_state->update_state_hash(sub_id_list[i], &list[i], NULL); @@ -7639,7 +7639,7 @@ int Xid_log_event::do_apply_event(rpl_group_info *rgi) rgi->gtid_pending= false;
gtid= rgi->current_gtid; - err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, true, + err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, rgi, false); if (err) { diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc index 7b1acf17ef5..94944b5b3e5 100644 --- a/sql/rpl_gtid.cc +++ b/sql/rpl_gtid.cc @@ -77,7 +77,7 @@ rpl_slave_state::record_and_update_gtid(THD *thd, rpl_group_info *rgi) rgi->gtid_pending= false; if (rgi->gtid_ignore_duplicate_state!=rpl_group_info::GTID_DUPLICATE_IGNORE) { - if (record_gtid(thd, &rgi->current_gtid, sub_id, false, false)) + if (record_gtid(thd, &rgi->current_gtid, sub_id, NULL, false)) DBUG_RETURN(1); update_state_hash(sub_id, &rgi->current_gtid, rgi); } @@ -328,6 +328,8 @@ rpl_slave_state::update(uint32 domain_id, uint32 server_id, uint64 sub_id, } } rgi->gtid_ignore_duplicate_state= rpl_group_info::GTID_DUPLICATE_NULL; + + rgi->pending_gtid_deletes_clear(); }
if (!(list_elem= (list_element *)my_malloc(sizeof(*list_elem), MYF(MY_WME)))) @@ -377,15 +379,24 @@ int rpl_slave_state::put_back_list(uint32 domain_id, list_element *list) { element *e; + int err= 0; + + mysql_mutex_lock(&LOCK_slave_state); if (!(e= (element *)my_hash_search(&hash, (const uchar *)&domain_id, 0))) - return 1; + { + err= 1; + goto end; + } while (list) { list_element *next= list->next; e->add(list); list= next; } - return 0; + +end: + mysql_mutex_unlock(&LOCK_slave_state); + return err; }
@@ -468,12 +479,12 @@ gtid_check_rpl_slave_state_table(TABLE *table) /* Write a gtid to the replication slave state table.
- Do it as part of the transaction, to get slave crash safety, or as a separate - transaction if !in_transaction (eg. MyISAM or DDL). - gtid The global transaction id for this event group. sub_id Value allocated within the sub_id when the event group was read (sub_id must be consistent with commit order in master binlog). + rgi rpl_group_info context, if we are recording the gtid transactionally + as part of replicating a transactional event. NULL if called from + outside of a replicated transaction.
Note that caller must later ensure that the new gtid and sub_id is inserted into the appropriate HASH element with rpl_slave_state.add(), so that it can @@ -481,13 +492,13 @@ gtid_check_rpl_slave_state_table(TABLE *table) */ int rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, - bool in_transaction, bool in_statement) + rpl_group_info *rgi, bool in_statement) { TABLE_LIST tlist; int err= 0; bool table_opened= false; TABLE *table; - list_element *elist= 0, *next; + list_element *elist= 0, *cur, *next; element *elem; ulonglong thd_saved_option= thd->variables.option_bits; Query_tables_list lex_backup; @@ -558,7 +569,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, thd->wsrep_ignore_table= true; #endif
- if (!in_transaction) + if (!rgi) { DBUG_PRINT("info", ("resetting OPTION_BEGIN")); thd->variables.option_bits&= @@ -601,9 +612,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, if ((elist= elem->grab_list()) != NULL) { /* Delete any old stuff, but keep around the most recent one. */ - list_element *cur= elist; - uint64 best_sub_id= cur->sub_id; + uint64 best_sub_id= elist->sub_id; list_element **best_ptr_ptr= &elist; + cur= elist; while ((next= cur->next)) { if (next->sub_id > best_sub_id) @@ -636,7 +647,8 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, table->file->print_error(err, MYF(0)); goto end; } - while (elist) + cur = elist; + while (cur) { uchar key_buffer[4+8];
@@ -646,9 +658,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, /* `break' does not work inside DBUG_EXECUTE_IF */ goto dbug_break; });
- next= elist->next; + next= cur->next;
- table->field[1]->store(elist->sub_id, true); + table->field[1]->store(cur->sub_id, true); /* domain_id is already set in table->record[0] from write_row() above. */ key_copy(key_buffer, table->record[0], &table->key_info[0], 0, false); if (table->file->ha_index_read_map(table->record[1], key_buffer, @@ -662,8 +674,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, not want to endlessly error on the same element in case of table corruption or such. */ - my_free(elist); - elist= next; + cur= next; if (err) break; } @@ -686,18 +697,31 @@ IF_DBUG(dbug_break:, ) */ if (elist) { - mysql_mutex_lock(&LOCK_slave_state); put_back_list(gtid->domain_id, elist); - mysql_mutex_unlock(&LOCK_slave_state); + elist = 0; }
ha_rollback_trans(thd, FALSE); } close_thread_tables(thd); - if (in_transaction) + if (rgi) + { thd->mdl_context.release_statement_locks(); + /* + Save the list of old gtid entries we deleted. If this transaction + fails later for some reason and is rolled back, the deletion of those + entries will be rolled back as well, and we will need to put them back + on the to-be-deleted list so we can re-do the deletion. Otherwise + redundant rows in mysql.gtid_slave_pos may accumulate if transactions + are rolled back and retried after record_gtid(). + */ + rgi->pending_gtid_deletes_save(gtid->domain_id, elist); + } else + { thd->mdl_context.release_transactional_locks(); + rpl_group_info::pending_gtid_deletes_free(elist); + } } thd->lex->restore_backup_query_tables_list(&lex_backup); thd->variables.option_bits= thd_saved_option; @@ -1080,7 +1104,7 @@ rpl_slave_state::load(THD *thd, char *state_from_master, size_t len,
if (gtid_parser_helper(&state_from_master, end, >id) || !(sub_id= next_sub_id(gtid.domain_id)) || - record_gtid(thd, >id, sub_id, false, in_statement) || + record_gtid(thd, >id, sub_id, NULL, in_statement) || update(gtid.domain_id, gtid.server_id, sub_id, gtid.seq_no, NULL)) return 1; if (state_from_master == end) diff --git a/sql/rpl_gtid.h b/sql/rpl_gtid.h index 79d566bddbf..7bd639b768f 100644 --- a/sql/rpl_gtid.h +++ b/sql/rpl_gtid.h @@ -182,7 +182,7 @@ struct rpl_slave_state uint64 seq_no, rpl_group_info *rgi); int truncate_state_table(THD *thd); int record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, - bool in_transaction, bool in_statement); + rpl_group_info *rgi, bool in_statement); uint64 next_sub_id(uint32 domain_id); int iterate(int (*cb)(rpl_gtid *, void *), void *data, rpl_gtid *extra_gtids, uint32 num_extra, diff --git a/sql/rpl_rli.cc b/sql/rpl_rli.cc index 64a1b535307..b35130c1505 100644 --- a/sql/rpl_rli.cc +++ b/sql/rpl_rli.cc @@ -1680,6 +1680,7 @@ rpl_group_info::reinit(Relay_log_info *rli) long_find_row_note_printed= false; did_mark_start_commit= false; gtid_ev_flags2= 0; + pending_gtid_delete_list= NULL; last_master_timestamp = 0; gtid_ignore_duplicate_state= GTID_DUPLICATE_NULL; speculation= SPECULATE_NO; @@ -1804,6 +1805,12 @@ void rpl_group_info::cleanup_context(THD *thd, bool error) erroneously update the GTID position. */ gtid_pending= false; + + /* + Rollback will have undone any deletions of old rows we might have made + in mysql.gtid_slave_pos. Put those rows back on the list to be deleted. + */ + pending_gtid_deletes_put_back(); } m_table_map.clear_tables(); slave_close_thread_tables(thd); @@ -2027,6 +2034,78 @@ rpl_group_info::unmark_start_commit() }
+/* + When record_gtid() has deleted any old rows from the table + mysql.gtid_slave_pos as part of a replicated transaction, save the list of + rows deleted here. + + If later the transaction fails (eg. optimistic parallel replication), the + deletes will be undone when the transaction is rolled back. Then we can + put back the list of rows into the rpl_global_gtid_slave_state, so that + we can re-do the deletes and avoid accumulating old rows in the table. +*/ +void +rpl_group_info::pending_gtid_deletes_save(uint32 domain_id, + rpl_slave_state::list_element *list) +{ + /* + We should never get to a state where we try to save a new pending list of + gtid deletes while we still have an old one. But make sure we handle it + anyway just in case, so we avoid leaving stray entries in the + mysql.gtid_slave_pos table. + */ + DBUG_ASSERT(!pending_gtid_delete_list); + if (unlikely(pending_gtid_delete_list)) + pending_gtid_deletes_put_back(); + + pending_gtid_delete_list= list; + pending_gtid_delete_list_domain= domain_id; +} + + +/* + Take the list recorded by pending_gtid_deletes_save() and put it back into + rpl_global_gtid_slave_state. This is needed if deletion of the rows was + rolled back due to transaction failure. +*/ +void +rpl_group_info::pending_gtid_deletes_put_back() +{ + if (pending_gtid_delete_list) + { + rpl_global_gtid_slave_state->put_back_list(pending_gtid_delete_list_domain, + pending_gtid_delete_list); + pending_gtid_delete_list= NULL; + } +} + + +/* + Free the list recorded by pending_gtid_deletes_save(). Done when the deletes + in the list have been permanently committed. +*/ +void +rpl_group_info::pending_gtid_deletes_clear() +{ + pending_gtid_deletes_free(pending_gtid_delete_list); + pending_gtid_delete_list= NULL; +} + + +void +rpl_group_info::pending_gtid_deletes_free(rpl_slave_state::list_element *list) +{ + rpl_slave_state::list_element *next; + + while (list) + { + next= list->next; + my_free(list); + list= next; + } +} + + rpl_sql_thread_info::rpl_sql_thread_info(Rpl_filter *filter) : rpl_filter(filter) { diff --git a/sql/rpl_rli.h b/sql/rpl_rli.h index 74d5b6fe416..b40a34a54e6 100644 --- a/sql/rpl_rli.h +++ b/sql/rpl_rli.h @@ -676,6 +676,11 @@ struct rpl_group_info /* Needs room for "Gtid D-S-N\x00". */ char gtid_info_buf[5+10+1+10+1+20+1];
+ /* List of not yet committed deletions in mysql.gtid_slave_pos. */ + rpl_slave_state::list_element *pending_gtid_delete_list; + /* Domain associated with pending_gtid_delete_list. */ + uint32 pending_gtid_delete_list_domain; + /* The timestamp, from the master, of the commit event. Used to do delayed update of rli->last_master_timestamp, for getting @@ -817,6 +822,12 @@ struct rpl_group_info char *gtid_info(); void unmark_start_commit();
+ static void pending_gtid_deletes_free(rpl_slave_state::list_element *list); + void pending_gtid_deletes_save(uint32 domain_id, + rpl_slave_state::list_element *list); + void pending_gtid_deletes_put_back(); + void pending_gtid_deletes_clear(); + time_t get_row_stmt_start_timestamp() { return row_stmt_start_timestamp; _______________________________________________ commits mailing list commits@mariadb.org https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits
Kristian, salute.
Hi Andrei!
Can you review this patch?
Thanks for taking care of this critical and complicated issue! I studied the patch to have understand its idea of basically implementing a roll-back to the slave gtid state as the slave transaction participant resource. This would be really a way to go unless you will dismiss the following tip. Why won't we defer the current eager/optimistic old sub-id records discard in rpl_slave_state::record_gtid() mysql_mutex_lock(&LOCK_slave_state); if ((elem= get_element(gtid->domain_id)) == NULL) { mysql_mutex_unlock(&LOCK_slave_state); my_error(ER_OUT_OF_RESOURCES, MYF(0)); err= 1; goto end; } if ((elist= elem->grab_list()) != NULL) { /* Delete any old stuff, but keep around the most recent one. */ uint64 best_sub_id= elist->sub_id; list_element **best_ptr_ptr= &elist; cur= elist; while ((next= cur->next)) { if (next->sub_id > best_sub_id) { best_sub_id= next->sub_id; best_ptr_ptr= &cur->next; } cur= next; } /* Delete the highest sub_id element from the old list, and put it back as the single-element new list. */ cur= *best_ptr_ptr; *best_ptr_ptr= cur->next; cur->next= NULL; elem->list= cur; } mysql_mutex_unlock(&LOCK_slave_state); till we know the transaction will be committed. E.g: Xid_log_event::do_apply_event rpl_slave_state::update_state_hash rpl_slave_state::update the stack bottom looks appropriate place to me, but I may miss some background or negative use cases. Doing it this way makes the slave gtid state to be updated right after gtid_slave_pos insert+delete records are committed, while currently it's in the opposite order. That is neither one provides atomicity of updating the two objects. Should not really matter, right? Cheers, Andrei
This fixes a rather serious bug in parallel replication that was reported on maria-discuss@ here: https://lists.launchpad.net/maria-discuss/msg05253.html
The user ended up with millions of rows in mysql.gtid_slave_pos, which is quite bad.
I think this should go in 10.1. The bug is probably also present in 10.0, but it will be much harder to trigger since 10.0 does not have optimistic parallel replication. So leaving it unfixed in 10.0 seems ok.
I plan to manually merge it to 10.2 and 10.3 as well, since the merge to 10.3 is not entirely trivial, due to interaction with @@gtid_pos_auto_engines.
Github branch here:
https://github.com/knielsen/server/commits/gtid_table_garbage_rows
- Kristian.
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
revision-id: 2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0 (mariadb-10.1.35-71-g2f4a0c5be2c) parent(s): 1fc5a6f30c3a9c047dcf9a36b00026d98f286f6b author: Kristian Nielsen committer: Kristian Nielsen timestamp: 2018-10-07 18:59:52 +0200 message:
Fix accumulation of old rows in mysql.gtid_slave_pos
This would happen especially in optimistic parallel replication, where there is a good chance that a transaction will be rolled back (due to conflicts) after it has executed record_gtid(). If the transaction did any deletions of old rows as part of record_gtid(), those deletions will be undone as well. And the code did not properly ensure that the deletions would be re-tried.
This patch makes record_gtid() remember the list of deletions done as part of a transaction. Then in rpl_slave_state::update() when the changes have been committed, we discard the list. However, in case of error and rollback, in cleanup_context() we will instead put the list back into rpl_global_gtid_slave_state so that the deletions will be re-tried later.
Probably fixes part of the cause of MDEV-12147 as well.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
--- .../suite/rpl/r/rpl_parallel_optimistic.result | 6 ++ .../suite/rpl/t/rpl_parallel_optimistic.test | 17 +++++ sql/log_event.cc | 6 +- sql/rpl_gtid.cc | 64 ++++++++++++------ sql/rpl_gtid.h | 2 +- sql/rpl_rli.cc | 79 ++++++++++++++++++++++ sql/rpl_rli.h | 11 +++ 7 files changed, 161 insertions(+), 24 deletions(-)
diff --git a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result index 3cd4f8231bf..99bd8562ffe 100644 --- a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result +++ b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result @@ -571,4 +571,10 @@ SET GLOBAL slave_parallel_mode=@old_parallel_mode; SET GLOBAL slave_parallel_threads=@old_parallel_threads; include/start_slave.inc DROP TABLE t1, t2, t3; +include/save_master_gtid.inc +include/sync_with_master_gtid.inc +Check that no more than the expected last two GTIDs are in mysql.gtid_slave_pos +select count(*) from mysql.gtid_slave_pos order by domain_id, sub_id; +count(*) +2 include/rpl_end.inc diff --git a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test index 9f6669279db..3867a3fdf3a 100644 --- a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test +++ b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test @@ -549,5 +549,22 @@ SET GLOBAL slave_parallel_threads=@old_parallel_threads;
--connection server_1 DROP TABLE t1, t2, t3; +--source include/save_master_gtid.inc + +--connection server_2 +--source include/sync_with_master_gtid.inc +# Check for left-over rows in table mysql.gtid_slave_pos (MDEV-12147). +# +# There was a bug when a transaction got a conflict and was rolled back. It +# might have also handled deletion of some old rows, and these deletions would +# then also be rolled back. And since the deletes were never re-tried, old no +# longer needed rows would accumulate in the table without limit. +# +# The earlier part of this test file have plenty of transactions being rolled +# back. But the last DROP TABLE statement runs on its own and should never +# conflict, thus at this point the mysql.gtid_slave_pos table should be clean. +--echo Check that no more than the expected last two GTIDs are in mysql.gtid_slave_pos +select count(*) from mysql.gtid_slave_pos order by domain_id, sub_id;
+--connection server_1 --source include/rpl_end.inc diff --git a/sql/log_event.cc b/sql/log_event.cc index e1912ad4620..e07b7002398 100644 --- a/sql/log_event.cc +++ b/sql/log_event.cc @@ -4429,7 +4429,7 @@ int Query_log_event::do_apply_event(rpl_group_info *rgi,
gtid= rgi->current_gtid; if (rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, - true, false)) + rgi, false)) { int errcode= thd->get_stmt_da()->sql_errno(); if (!is_parallel_retry_error(rgi, errcode)) @@ -7132,7 +7132,7 @@ Gtid_list_log_event::do_apply_event(rpl_group_info *rgi) { if ((ret= rpl_global_gtid_slave_state->record_gtid(thd, &list[i], sub_id_list[i], - false, false))) + NULL, false))) return ret; rpl_global_gtid_slave_state->update_state_hash(sub_id_list[i], &list[i], NULL); @@ -7639,7 +7639,7 @@ int Xid_log_event::do_apply_event(rpl_group_info *rgi) rgi->gtid_pending= false;
gtid= rgi->current_gtid; - err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, true, + err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, rgi, false); if (err) { diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc index 7b1acf17ef5..94944b5b3e5 100644 --- a/sql/rpl_gtid.cc +++ b/sql/rpl_gtid.cc @@ -77,7 +77,7 @@ rpl_slave_state::record_and_update_gtid(THD *thd, rpl_group_info *rgi) rgi->gtid_pending= false; if (rgi->gtid_ignore_duplicate_state!=rpl_group_info::GTID_DUPLICATE_IGNORE) { - if (record_gtid(thd, &rgi->current_gtid, sub_id, false, false)) + if (record_gtid(thd, &rgi->current_gtid, sub_id, NULL, false)) DBUG_RETURN(1); update_state_hash(sub_id, &rgi->current_gtid, rgi); } @@ -328,6 +328,8 @@ rpl_slave_state::update(uint32 domain_id, uint32 server_id, uint64 sub_id, } } rgi->gtid_ignore_duplicate_state= rpl_group_info::GTID_DUPLICATE_NULL; + + rgi->pending_gtid_deletes_clear(); }
if (!(list_elem= (list_element *)my_malloc(sizeof(*list_elem), MYF(MY_WME)))) @@ -377,15 +379,24 @@ int rpl_slave_state::put_back_list(uint32 domain_id, list_element *list) { element *e; + int err= 0; + + mysql_mutex_lock(&LOCK_slave_state); if (!(e= (element *)my_hash_search(&hash, (const uchar *)&domain_id, 0))) - return 1; + { + err= 1; + goto end; + } while (list) { list_element *next= list->next; e->add(list); list= next; } - return 0; + +end: + mysql_mutex_unlock(&LOCK_slave_state); + return err; }
@@ -468,12 +479,12 @@ gtid_check_rpl_slave_state_table(TABLE *table) /* Write a gtid to the replication slave state table.
- Do it as part of the transaction, to get slave crash safety, or as a separate - transaction if !in_transaction (eg. MyISAM or DDL). - gtid The global transaction id for this event group. sub_id Value allocated within the sub_id when the event group was read (sub_id must be consistent with commit order in master binlog). + rgi rpl_group_info context, if we are recording the gtid transactionally + as part of replicating a transactional event. NULL if called from + outside of a replicated transaction.
Note that caller must later ensure that the new gtid and sub_id is inserted into the appropriate HASH element with rpl_slave_state.add(), so that it can @@ -481,13 +492,13 @@ gtid_check_rpl_slave_state_table(TABLE *table) */ int rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, - bool in_transaction, bool in_statement) + rpl_group_info *rgi, bool in_statement) { TABLE_LIST tlist; int err= 0; bool table_opened= false; TABLE *table; - list_element *elist= 0, *next; + list_element *elist= 0, *cur, *next; element *elem; ulonglong thd_saved_option= thd->variables.option_bits; Query_tables_list lex_backup; @@ -558,7 +569,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, thd->wsrep_ignore_table= true; #endif
- if (!in_transaction) + if (!rgi) { DBUG_PRINT("info", ("resetting OPTION_BEGIN")); thd->variables.option_bits&= @@ -601,9 +612,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, if ((elist= elem->grab_list()) != NULL) { /* Delete any old stuff, but keep around the most recent one. */ - list_element *cur= elist; - uint64 best_sub_id= cur->sub_id; + uint64 best_sub_id= elist->sub_id; list_element **best_ptr_ptr= &elist; + cur= elist; while ((next= cur->next)) { if (next->sub_id > best_sub_id) @@ -636,7 +647,8 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, table->file->print_error(err, MYF(0)); goto end; } - while (elist) + cur = elist; + while (cur) { uchar key_buffer[4+8];
@@ -646,9 +658,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, /* `break' does not work inside DBUG_EXECUTE_IF */ goto dbug_break; });
- next= elist->next; + next= cur->next;
- table->field[1]->store(elist->sub_id, true); + table->field[1]->store(cur->sub_id, true); /* domain_id is already set in table->record[0] from write_row() above. */ key_copy(key_buffer, table->record[0], &table->key_info[0], 0, false); if (table->file->ha_index_read_map(table->record[1], key_buffer, @@ -662,8 +674,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, not want to endlessly error on the same element in case of table corruption or such. */ - my_free(elist); - elist= next; + cur= next; if (err) break; } @@ -686,18 +697,31 @@ IF_DBUG(dbug_break:, ) */ if (elist) { - mysql_mutex_lock(&LOCK_slave_state); put_back_list(gtid->domain_id, elist); - mysql_mutex_unlock(&LOCK_slave_state); + elist = 0; }
ha_rollback_trans(thd, FALSE); } close_thread_tables(thd); - if (in_transaction) + if (rgi) + { thd->mdl_context.release_statement_locks(); + /* + Save the list of old gtid entries we deleted. If this transaction + fails later for some reason and is rolled back, the deletion of those + entries will be rolled back as well, and we will need to put them back + on the to-be-deleted list so we can re-do the deletion. Otherwise + redundant rows in mysql.gtid_slave_pos may accumulate if transactions + are rolled back and retried after record_gtid(). + */ + rgi->pending_gtid_deletes_save(gtid->domain_id, elist); + } else + { thd->mdl_context.release_transactional_locks(); + rpl_group_info::pending_gtid_deletes_free(elist); + } } thd->lex->restore_backup_query_tables_list(&lex_backup); thd->variables.option_bits= thd_saved_option; @@ -1080,7 +1104,7 @@ rpl_slave_state::load(THD *thd, char *state_from_master, size_t len,
if (gtid_parser_helper(&state_from_master, end, >id) || !(sub_id= next_sub_id(gtid.domain_id)) || - record_gtid(thd, >id, sub_id, false, in_statement) || + record_gtid(thd, >id, sub_id, NULL, in_statement) || update(gtid.domain_id, gtid.server_id, sub_id, gtid.seq_no, NULL)) return 1; if (state_from_master == end) diff --git a/sql/rpl_gtid.h b/sql/rpl_gtid.h index 79d566bddbf..7bd639b768f 100644 --- a/sql/rpl_gtid.h +++ b/sql/rpl_gtid.h @@ -182,7 +182,7 @@ struct rpl_slave_state uint64 seq_no, rpl_group_info *rgi); int truncate_state_table(THD *thd); int record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, - bool in_transaction, bool in_statement); + rpl_group_info *rgi, bool in_statement); uint64 next_sub_id(uint32 domain_id); int iterate(int (*cb)(rpl_gtid *, void *), void *data, rpl_gtid *extra_gtids, uint32 num_extra, diff --git a/sql/rpl_rli.cc b/sql/rpl_rli.cc index 64a1b535307..b35130c1505 100644 --- a/sql/rpl_rli.cc +++ b/sql/rpl_rli.cc @@ -1680,6 +1680,7 @@ rpl_group_info::reinit(Relay_log_info *rli) long_find_row_note_printed= false; did_mark_start_commit= false; gtid_ev_flags2= 0; + pending_gtid_delete_list= NULL; last_master_timestamp = 0; gtid_ignore_duplicate_state= GTID_DUPLICATE_NULL; speculation= SPECULATE_NO; @@ -1804,6 +1805,12 @@ void rpl_group_info::cleanup_context(THD *thd, bool error) erroneously update the GTID position. */ gtid_pending= false; + + /* + Rollback will have undone any deletions of old rows we might have made + in mysql.gtid_slave_pos. Put those rows back on the list to be deleted. + */ + pending_gtid_deletes_put_back(); } m_table_map.clear_tables(); slave_close_thread_tables(thd); @@ -2027,6 +2034,78 @@ rpl_group_info::unmark_start_commit() }
+/* + When record_gtid() has deleted any old rows from the table + mysql.gtid_slave_pos as part of a replicated transaction, save the list of + rows deleted here. + + If later the transaction fails (eg. optimistic parallel replication), the + deletes will be undone when the transaction is rolled back. Then we can + put back the list of rows into the rpl_global_gtid_slave_state, so that + we can re-do the deletes and avoid accumulating old rows in the table. +*/ +void +rpl_group_info::pending_gtid_deletes_save(uint32 domain_id, + rpl_slave_state::list_element *list) +{ + /* + We should never get to a state where we try to save a new pending list of + gtid deletes while we still have an old one. But make sure we handle it + anyway just in case, so we avoid leaving stray entries in the + mysql.gtid_slave_pos table. + */ + DBUG_ASSERT(!pending_gtid_delete_list); + if (unlikely(pending_gtid_delete_list)) + pending_gtid_deletes_put_back(); + + pending_gtid_delete_list= list; + pending_gtid_delete_list_domain= domain_id; +} + + +/* + Take the list recorded by pending_gtid_deletes_save() and put it back into + rpl_global_gtid_slave_state. This is needed if deletion of the rows was + rolled back due to transaction failure. +*/ +void +rpl_group_info::pending_gtid_deletes_put_back() +{ + if (pending_gtid_delete_list) + { + rpl_global_gtid_slave_state->put_back_list(pending_gtid_delete_list_domain, + pending_gtid_delete_list); + pending_gtid_delete_list= NULL; + } +} + + +/* + Free the list recorded by pending_gtid_deletes_save(). Done when the deletes + in the list have been permanently committed. +*/ +void +rpl_group_info::pending_gtid_deletes_clear() +{ + pending_gtid_deletes_free(pending_gtid_delete_list); + pending_gtid_delete_list= NULL; +} + + +void +rpl_group_info::pending_gtid_deletes_free(rpl_slave_state::list_element *list) +{ + rpl_slave_state::list_element *next; + + while (list) + { + next= list->next; + my_free(list); + list= next; + } +} + + rpl_sql_thread_info::rpl_sql_thread_info(Rpl_filter *filter) : rpl_filter(filter) { diff --git a/sql/rpl_rli.h b/sql/rpl_rli.h index 74d5b6fe416..b40a34a54e6 100644 --- a/sql/rpl_rli.h +++ b/sql/rpl_rli.h @@ -676,6 +676,11 @@ struct rpl_group_info /* Needs room for "Gtid D-S-N\x00". */ char gtid_info_buf[5+10+1+10+1+20+1];
+ /* List of not yet committed deletions in mysql.gtid_slave_pos. */ + rpl_slave_state::list_element *pending_gtid_delete_list; + /* Domain associated with pending_gtid_delete_list. */ + uint32 pending_gtid_delete_list_domain; + /* The timestamp, from the master, of the commit event. Used to do delayed update of rli->last_master_timestamp, for getting @@ -817,6 +822,12 @@ struct rpl_group_info char *gtid_info(); void unmark_start_commit();
+ static void pending_gtid_deletes_free(rpl_slave_state::list_element *list); + void pending_gtid_deletes_save(uint32 domain_id, + rpl_slave_state::list_element *list); + void pending_gtid_deletes_put_back(); + void pending_gtid_deletes_clear(); + time_t get_row_stmt_start_timestamp() { return row_stmt_start_timestamp; _______________________________________________ commits mailing list commits@mariadb.org https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits
andrei.elkin@pp.inet.fi writes:
Why won't we defer the current eager/optimistic old sub-id records discard in rpl_slave_state::record_gtid()
mysql_mutex_lock(&LOCK_slave_state);
if ((elist= elem->grab_list()) != NULL) { /* Delete any old stuff, but keep around the most recent one. */
mysql_mutex_unlock(&LOCK_slave_state);
till we know the transaction will be committed. E.g:
Xid_log_event::do_apply_event rpl_slave_state::update_state_hash rpl_slave_state::update
We need to have those records removed from the state immediately, otherwise the next record_gtid() (from another parallel replication worker) will try to grab and delete the same records, leading to extra redundant delete calls and loss of efficiency. We cannot hold the lock on LOCK_slave_state until we have committed, of course, as that would severely reduce parallelism. And we cannot defer the deletion of records until rpl_slave_state::update(), as that would require starting another transaction, which is again inefficient. However, I do like the idea of defering the handling of deletions from record_gtid() to rpl_slave_state::update(). I have never been entirely happy about the overhead of one row deletion in every replicated transaction. Maybe the deletion of old rows (and update of gtid state) could be done in the replication background thread instead (handle_slave_background())? In rpl_slave_state::update(), we will increment a counter. When this counter reaches 100 (or some configurable value), we will signal the background thread to grab the list and delete all of it in one go. This way we batch up more row deletions, and avoid the delete overhead in time-critical record_gtid(), which is probably more efficient. We do introduce some extra transactions, but that can be minimised by adjusting the batch size (eg 100 -> just 1% more transactions). I can try to create a patch to do this. Maybe moving deletions from record_gtid() into the background thread is too big a change for 10.1/10.2 ? But we could use the current patch for 10.1, and do the more complicated patch only in 10.3 (or 10.4?). What do you think? - Kristian.
The 10.1 patch is good. But one question,
Hi Andrei!
Can you review this patch? ...
@@ -646,9 +658,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, /* `break' does not work inside DBUG_EXECUTE_IF */ goto dbug_break; });
- next= elist->next; + next= cur->next;
- table->field[1]->store(elist->sub_id, true); + table->field[1]->store(cur->sub_id, true); /* domain_id is already set in table->record[0] from write_row() above. */ key_copy(key_buffer, table->record[0], &table->key_info[0], 0, false); if (table->file->ha_index_read_map(table->record[1], key_buffer, @@ -662,8 +674,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id, not want to endlessly error on the same element in case of table corruption or such. */ - my_free(elist); - elist= next; + cur= next; if (err) break; } @@ -686,18 +697,31 @@ IF_DBUG(dbug_break:, ) */ if (elist) { - mysql_mutex_lock(&LOCK_slave_state); put_back_list(gtid->domain_id, elist); - mysql_mutex_unlock(&LOCK_slave_state); + elist = 0; }
ha_rollback_trans(thd, FALSE); } close_thread_tables(thd); - if (in_transaction) + if (rgi) + { thd->mdl_context.release_statement_locks(); + /* + Save the list of old gtid entries we deleted. If this transaction + fails later for some reason and is rolled back, the deletion of those + entries will be rolled back as well, and we will need to put them back + on the to-be-deleted list so we can re-do the deletion. Otherwise + redundant rows in mysql.gtid_slave_pos may accumulate if transactions + are rolled back and retried after record_gtid(). + */ + rgi->pending_gtid_deletes_save(gtid->domain_id, elist); + } else + { thd->mdl_context.release_transactional_locks();
+ rpl_group_info::pending_gtid_deletes_free(elist); + }
Why it's not rpl_group_info::pending_gtid_deletes_clear() called?
andrei.elkin@pp.inet.fi writes:
The 10.1 patch is good.
Thanks!
Maybe moving deletions from record_gtid() into the background thread is too big a change for 10.1/10.2 ? But we could use the current patch for 10.1, and do the more complicated patch only in 10.3 (or 10.4?).
What do you think?
Let's stick your plan.
Ok, great. I'll push this patch (if you have no further questions/comments), and work on a new patch for 10.3 which moves the delete logic into the replication background thread.
But one question,
@@ -662,8 +674,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
+ if (rgi) + { thd->mdl_context.release_statement_locks(); + /* + Save the list of old gtid entries we deleted. If this transaction + fails later for some reason and is rolled back, the deletion of those + entries will be rolled back as well, and we will need to put them back + on the to-be-deleted list so we can re-do the deletion. Otherwise + redundant rows in mysql.gtid_slave_pos may accumulate if transactions + are rolled back and retried after record_gtid(). + */ + rgi->pending_gtid_deletes_save(gtid->domain_id, elist); + } else + { thd->mdl_context.release_transactional_locks();
+ rpl_group_info::pending_gtid_deletes_free(elist); + }
Why it's not rpl_group_info::pending_gtid_deletes_clear() called?
Just because in the else-branch we did not run pending_gtid_deletes_save(). pending_gtid_deletes_clear() frees the list previously stored in rgi by pending_gtid_deletes_save(), while pending_gtid_deletes_free() just frees an explicitly given list. Thanks, - Kristian.
Hi Andrei! I have now pushed the patch to 10.1 (and merged to 10.3 since the merge interacts with the per-engine mysql.gtid_slave_pos feature and requires non-obvious code and testcase changes). And I've started working on the improved 10.3 patch where the deletion is moved to the replication background thread. Thanks, - Kristian.
participants (2)
-
andrei.elkin@pp.inet.fi
-
Kristian Nielsen