Hi!

On Mon, Sep 15, 2014 at 10:09 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Kristian Nielsen <knielsen@knielsen-hq.org> writes:

> 4. Also see detailed comments for some possible problems with the
> implementation. The most serious is probably to ensure that events are not
> skipped after the end of the group, we need a couple of tests for this, see

Hm, actually I thought of another potential problem.

What happens if the slave disconnects from the master in the middle of
receiving an event group? There are several tricky issues around this part of
replication, we definitely need some test cases for this as well.

There are different cases, for example whether using GTID or non-GTID mode,
whether the slave just reconnects, the I/O and/or SQL threads are restarted,
or the entire server restarts. And if the filters are reconfigured before
reconnecting.

I have tried to add multiple test scenarios to cover these aspects.


For example, it seems to me that in non-GTID mode, if we restart the server in
the middle of receiving an event group, we can easily end up with ignoring one
half of the group and not the other, which is very bad?

The filtering would only work when slave is configured with GTID.
 

And in GTID mode, the reconnect issue is quite tricky also, we need a test
case to check that everything is ok. Though in this case we always reconnect
at the start of an event group, so maybe things are easier to handle.

For non-GTID mode, maybe we need to handle the ignoring in the SQL thread
instead? Or alternatively, we can make ignoring events based on domain_id only
legal in GTID mode, and give an error in non-GTID?

I think the patch follows the 2nd approach minus the error. Could you please elaborate
on the error?

Best.

- Nirbhay