Hi! On Mon, Sep 15, 2014 at 10:09 AM, Kristian Nielsen <knielsen@knielsen-hq.org
wrote:
Kristian Nielsen <knielsen@knielsen-hq.org> writes:
4. Also see detailed comments for some possible problems with the implementation. The most serious is probably to ensure that events are not skipped after the end of the group, we need a couple of tests for this, see
Hm, actually I thought of another potential problem.
What happens if the slave disconnects from the master in the middle of receiving an event group? There are several tricky issues around this part of replication, we definitely need some test cases for this as well.
There are different cases, for example whether using GTID or non-GTID mode, whether the slave just reconnects, the I/O and/or SQL threads are restarted, or the entire server restarts. And if the filters are reconfigured before reconnecting.
I have tried to add multiple test scenarios to cover these aspects.
For example, it seems to me that in non-GTID mode, if we restart the server in the middle of receiving an event group, we can easily end up with ignoring one half of the group and not the other, which is very bad?
The filtering would only work when slave is configured with GTID.
And in GTID mode, the reconnect issue is quite tricky also, we need a test case to check that everything is ok. Though in this case we always reconnect at the start of an event group, so maybe things are easier to handle.
For non-GTID mode, maybe we need to handle the ignoring in the SQL thread instead? Or alternatively, we can make ignoring events based on domain_id only legal in GTID mode, and give an error in non-GTID?
I think the patch follows the 2nd approach minus the error. Could you please elaborate on the error? Best. - Nirbhay