Pavel Ivanov <pivanof@google.com> writes:
See logs in attachment. It looks clear to me that relay logs get replayed after crash and don't get deleted. I'm not sure though if the reason of that can be seen in the logs.
No, I agree it's not clear from the logs one way or the other, so good that you mention this.
Can you point me to the code where relay logs are deleted? I'll try to check why it's not called...
It is this, in slave.cc start_slave_threads(): if (mi->using_gtid != Master_info::USE_GTID_NO && !mi->slave_running && !mi->rli.slave_running) { purge_relay_logs(&mi->rli, NULL, 0, &errmsg); mi->master_log_name[0]= 0; mi->master_log_pos= 0; } When both slave IO and SQL threads are stopped, we purge the relay logs before starting. And surely they must be stopped when we first start them after a crash ... And then in get_master_version_and_clock(), if we deleted the relay logs then we start from the GTID position: if (mi->using_gtid != Master_info::USE_GTID_NO && !mi->master_log_name[0]) { ... I would suggest printouts to the error log around those two places in the code and check that both are executed as expected at startup. Maybe either if() condition becomes false for some reason, or purge_relay_logs() fails to purge, but I do not see how from the code at the moment. It would also help if I had implemented MDEV-4491 (https://mariadb.atlassian.net/browse/MDEV-4491), to give better printouts in the error log about where and how the slave is actually connecting to the master. I will try to get this done soon. One thing did spring to mind as I was looking at this code, the XtraDB --innodb_recovery_update_relay_log option. This can overwrite relay-log information during crash recovery, so might be able to interfere. But it is off by default and does printouts to error log that are not seen in your logs, so it seems this is not related ... Thanks for helping look into this. I actually plan to change this code to something more robust, but it would be good to find the real problem here, otherwise I may end up just hiding the bug, not fixing it ... - Kristian.