Kristian, As I understand currently when slave connects to master and wants to start replicating it passes GTID to start from, master finds binlog file where the earliest GTID is located and then scans through that file to find the exact binlog position to start sending binlog events from. If this binlog file is pretty big then scanning can take a very long time. I guess especially long when several slaves try to start replicating roughly at the same time. We observed 60-90 seconds between initial connection by the slave and the first real binlog events starting to flow. In this period of time slave doesn't receive anything from master and thus it's very easy to confuse such situation with connection loss, hit slave_net_timeout, reopen connection to the master again and force it to start searching through binlog file from the very beginning... Putting aside the argument of what value is good enough for slave_net_timeout, I'd say in any case slave taking 60 seconds to just start receiving binlog events from master is unacceptable. Did you think about this problem before? Maybe you've even planned already to implement some solution for this? Thank you, Pavel