Nirbhay Choubey <nirbhay@skysql.com> writes:
------------------------------------------------------------ revno: 3879 revision-id: nirbhay@skysql.com-20140912181622-s33e7r0vtgpju95q parent: nirbhay@skysql.com-20140814224304-rtkeal7p9lk06li1 committer: Nirbhay Choubey <nirbhay@skysql.com> branch nick: maria-10.0-galera_6593 timestamp: Fri 2014-09-12 14:16:22 -0400 message: MDEV-6593 : domain_id based replication filters
Added support for IGNORE_DOMAIN_IDS and DO_DOMAIN_IDS.
Here is my review. Some general comments, followed by some detailed comments in-line in the patch: 1. Did you prepare any documentation? You could either add it to the knowledgebase directly, or write up a text file and ask Daniel to add it in relevant places. 2. Did you run the full test suite? I would have expected at least some tests in the multi_source suite to need modification. The chance touches enough things that you should test it in a feature tree in Buildbot to ensure that it is green before pushing it to a main tree. 3. I think some extra test cases are needed: - You should test (and document) what happens when the same domain_id is in both the DO and the IGNORE list. Is it actually ever sensible to have both DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS ? If not, I think it should be an error to try. - You should also add a test using non-GTID mode. - You should also add a test where parallel replication is used (--slave-parallel-threads > 0). - Add a test for the following scenario when using GTID mode: Some domain id is ignored, some events are received from the master and ignored. Then the slave is stopped, and the ignore rule is removed. What happens when the slave is restarted and reconnects? I think the previously ignored domain id events will now be re-fetched and replicated. I guess that's ok, but it should be documented. Also test what happens when binlog files on the master have been rotated and purged, so that the events are no longer available on the master (an error should occur). - Add a similar test for when not using GTID mode, the last event replicated by the slave is ignored, the slave is stopped, the ignore rule removed and the slave restarted; will the last event now be re-fetched or not (I think not). - Add a test that checks that seconds_behind_master is properly updated to zero even when the last event is ignored due to DO_DOMAIN_IDS/IGNORE_DOMAIN_IDS. It can perhaps be a bit tricky to test, but I think you can use this existing test as a basis: mysql-test/suite/rpl/t/rpl_parallel2.test - It would also be good to check that the code works when used on an old master (which does not send GTID events). You can do this by creating a binlog file with a 5.5 server and storing it in mysql-test/std_data/, and then installing it in a master; there should be several other test cases that already do this. 4. Also see detailed comments for some possible problems with the implementation. The most serious is probably to ensure that events are not skipped after the end of the group, we need a couple of tests for this, see below. Also, there is a small memory leak, see below.
+DO_DOMAIN_IDS (before) : 1 +IGNORE_DOMAIN_IDS (before) : 2 +CHANGE MASTER TO DO_DOMAIN_IDS=(4,4,5,1,7,7,7,1,1,2,6,8,1,4,5,5,9,3), MASTER_USE_GTID=current_pos;
So if a domain_id is specified multiple times, the redundant ones will be ignored. It would seem preferable to give an error... but as far as I can see the behaviour is the same for IGNORE_SERVER_IDS, so better to be consistent as you did in the code. Just be sure to add this to the documentation.
=== added file 'mysql-test/suite/rpl/t/rpl_domain_id_filter.test' --- a/mysql-test/suite/rpl/t/rpl_domain_id_filter.test 1970-01-01 00:00:00 +0000 +++ b/mysql-test/suite/rpl/t/rpl_domain_id_filter.test 2014-09-12 18:16:22 +0000
+SET @gtid_domain_id_saved = @@session.gtid_domain_id;
This is not needed, nor do you use it anywhere, you can remove it.
+##### Case 3: When both DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS are non-empty + +source include/stop_slave.inc; +let $do_domain_ids_before= query_get_value(SHOW SLAVE STATUS, Replicate_Do_Domain_Ids, 1); +let $ignore_domain_ids_before= query_get_value(SHOW SLAVE STATUS, Replicate_Ignore_Domain_Ids, 1); +--echo DO_DOMAIN_IDS (before) : $do_domain_ids_before +--echo IGNORE_DOMAIN_IDS (before) : $ignore_domain_ids_before + +# Replicate events belonging to "domain_id 1". (DO_DOMAIN_IDS should win!) +CHANGE MASTER TO DO_DOMAIN_IDS=(1), IGNORE_DOMAIN_IDS=(2), MASTER_USE_GTID=current_pos;
Did you intend here to test overlapping ids in DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS (due to the comment "DO_DOMAIN_IDS should win") ? But if I understand correctly, IGNORE_DOMAIN_IDS will be ignored if DO_DOMAIN_IDS is non-empty. So I think it should be an error to try to set both - do you agree?
+--source include/rpl_end.inc
Please add additional testcases, as suggested at the start of review.
=== modified file 'sql/rpl_mi.cc' --- a/sql/rpl_mi.cc 2014-02-19 10:05:15 +0000 +++ b/sql/rpl_mi.cc 2014-09-12 18:16:22 +0000 @@ -513,6 +513,25 @@ else mi->using_gtid= Master_info::USE_GTID_NO; } + + // DO_DOMAIN_IDS + if (0 == strncmp(buf, STRING_WITH_LEN("do_domain_ids=")) && + mi->domain_id_filter.init_ids(buf + sizeof("do_domain_ids"), + Domain_id_filter::DO_DOMAIN_IDS)) + { + sql_print_error("Failed to initialize master info do_domain_ids"); + goto errwithmsg; + } + + // IGNORE_DOMAIN_IDS + if (0 == strncmp(buf, STRING_WITH_LEN("ignore_domain_ids=")) && + mi->domain_id_filter.init_ids(buf + sizeof("ignore_domain_ids"), + Domain_id_filter::IGNORE_DOMAIN_IDS)) + { + sql_print_error("Failed to initialize master info " + "ignore_domain_ids"); + goto errwithmsg; + }
So here you use the same style as adjacent code (that I wrote). However, I have since learned that the style (0 == strncmp(...)) is not used in the server. So please change this, as well as my existing code, to use either strncmp(...) == 0 or !strncmp(), as you prefer. Another thing from my old code is the sizeof("do_domain_ids"), which highly confused me for a bit before I realised that while the '=' is not included in the sizeof(), the terminating '\0' _is_ included. So since even I myself is confused by this code, better clarify it ;-) Perhaps it would be clearer if changed to (sizeof("do_domain_ids=")-1) (both in my old code and in your new code). (Normally it's good to follow the style of existing code as you did, I just noticed this in the review and realised that my own "existing code" would benefit from these small improvements).
@@ -634,6 +653,16 @@ } }
+ char* do_domain_ids_buf= + mi->domain_id_filter.as_string(Domain_id_filter::DO_DOMAIN_IDS); + if (do_domain_ids_buf == NULL) + DBUG_RETURN(1); + + char* ignore_domain_ids_buf= + mi->domain_id_filter.as_string(Domain_id_filter::IGNORE_DOMAIN_IDS); + if (ignore_domain_ids_buf == NULL) + DBUG_RETURN(1);
You have a memory leak here, do_domain_ids_buf will not be freed if ignore_domain_ids_buf fails due to out-of-memory.
@@ -1348,4 +1382,218 @@ DBUG_RETURN(result); }
+/* ctor */ +Domain_id_filter::Domain_id_filter() : m_filter(false) +{ + for (int i= DO_DOMAIN_IDS; i <= IGNORE_DOMAIN_IDS; i ++) + { + my_init_dynamic_array(&m_domain_ids[i], sizeof(uint32), 16, 16, MYF(0)); + } +} + +/* dtor */ +Domain_id_filter::~Domain_id_filter()
I don't think comments like /* ctor */ and /* dtor */ are useful, better just omit them.
+/** + Update the domain id list of the given type with elements from the new list. + + @param new_ids [IN] new list of ids + @param type [IN] domain id list type to update + @param changed [IN] has the domain id list changed in the + last CHANGE MASTER command? + + @retval void +*/ +void Domain_id_filter::update_ids(DYNAMIC_ARRAY *new_ids, enum_list_type type, + bool changed) +{ + DYNAMIC_ARRAY *ids= &m_domain_ids[type]; + + if (changed == true) + reset_dynamic(ids); + + /* bsearch requires an ordered list. */ + sort_dynamic(new_ids, change_master_domain_id_cmp); + + for (uint i= 0; i < new_ids->elements; i++) + { + uint32 domain_id; + get_dynamic(new_ids, (void *) &domain_id, i); + + if (bsearch((const uint32 *) &domain_id, ids->buffer, ids->elements, + sizeof(uint32), change_master_domain_id_cmp) == NULL) + { + insert_dynamic(ids, (uint32*) &domain_id); + } + } + return; +}
Ok, so I understand that this code is modelled after the existing code for IGNORED_SERVER_IDS. But I found it rather confusing, because it looks like there is a bug: if changed==false, then the code does not ensure that the resulting array will still be sorted. I guess the bug is not possible to trigger, because the list of stuff to add will only be non-empty if changed==true? But that is rather confusing. Please simplify the logic; maybe you only need to call this function when changed==true, so the parameter can be omitted and the array always reset? Also, consider if you can rewrite the existing code for IGNORE_SERVER_IDS so that the same common code can be used as for IGNORE_DOMAIN_IDS and DO_DOMAIN_IDS, that would simplify the code and reduce risk of bugs or inconsistencies.
+/** + Initialize the given domain id list with the numbers from the specified + buffer. + + @param buf [IN] buffer + @param type [IN] domain id list type + + @retval false Success + true Error +*/ +bool Domain_id_filter::init_ids(char *buf, enum_list_type type) +{ + char *token, *last; + uint num_items; + DYNAMIC_ARRAY *ids= &m_domain_ids[type]; + + token= strtok_r(buf, " ", &last); + if (token == NULL) + { + return true; /* error */ + } + + num_items= atoi(token); + for (uint i= 0; i < num_items; i++) + { + token= strtok_r(NULL, " ", &last); + if (token == NULL) + { + return true; /* error */ + } + else + { + uint32 val= atoi(token); + insert_dynamic(ids, (uchar *) &val); + } + } + return false; +}
Again, did you consider using the same code for this and for IGNORE_SERVER_IDS? (I think you did, because you added a comment to init_dynarray_intvar_from_file ;-). Maybe consider it again; it is fine to rewrite/remove the existing code for IGNORE_SERVER_IDS and replace it with your new code. Again, it would reduce risk for bugs or inconsistencies.
+ /* getter */ + bool get_filter() { return m_filter; } + + /* setter */ + void set_filter(uint32 domain_id);
These are not really getter and setters, I think, at least I find those names confusing. Maybe something like do_filter(uint32 domain_id) and is_group_filtered() ? And expand the comments, so that they explain that these functions respectively check a GTID for needing to be filtered, and return whether the current group is being filtered. Also, personally I dislike setter/getter, and would prefer just to access the m_filter as a public member. But I suppose that's style and up to you.
+ /* + Initialize the given domain id list with the numbers from the specified + buffer. + + @param buf [IN] buffer + @param type [IN] domain id list type + + @retval false Success + true Error + */ + bool init_ids(char *buf, enum_list_type type);
Clarify in the comment that the list is initialized from a string of space-separated numbers, the first of which is the number of following entries.
@@ -182,6 +279,7 @@ bool flush_relay_log_cache, bool need_lock_relay_log); int change_master_server_id_cmp(ulong *id1, ulong *id2); +int change_master_domain_id_cmp(uint32 *id1, uint32 *id2);
I think change_master_domain_id_cmp() is only used in rpl_mi.cc, so should be static there, and not refered here in rpl_mi.h
=== modified file 'sql/slave.cc' --- a/sql/slave.cc 2014-08-12 03:55:41 +0000 +++ b/sql/slave.cc 2014-09-12 18:16:22 +0000 @@ -1259,6 +1259,7 @@ going to ignore (to not log them in the relay log). Items being read are supposed to be decimal output of values of a type shorter or equal of @c long and separated by the single space. + It also used to restore DO_DOMAIN_IDS & IGNORE_DOMAIN_IDS lists.
This comment suggests you wanted to share code, then later changed your mind. If you end up keeping the code separate, you still need to delete this comment. (And there is a typo: "It also" -> "It is also", but if you remove it anyway...).
@@ -5601,6 +5610,10 @@ mi->last_queued_gtid= event_gtid; mi->last_queued_gtid_standalone= (gtid_flag & Gtid_log_event::FL_STANDALONE) != 0; + + /* Should filter all the subsequent events in the current GTID group? */ + mi->domain_id_filter.set_filter(event_gtid.domain_id);
So, here we will set m_filter to true, if we want to skip the following group. But I don't see any code that will set it to false at the end of the group? That seems a problem, as we could then end up wrongly skipping all kinds of other events, such as Gtid_list, Incident, ... Also note that it is possible for a GTID group to not be terminated by COMMIT/XID. This can happen at least if the master crashes in the middle of writing a group; the slave will receive only the first part of it, followed by a master restart Format_description event. That case needs to be handled, and you should add a test case for it. Also it is possible for eg. a master downgrade to introduce following event groups that do not start with GTID - seems these would also be wrongly skipped? So we also need a test case for this... Hope this helps, - Kristian.