andrei.elkin@pp.inet.fi writes:
I read through your analysis of the case to agree with the plan and its motivation:
MDEV-6608> Generally, it is a very bad idea to destroy data during crash recovery ... try an approach that leaves the crashed binlog file intact, and instead ensures that the crashed binlog can be properly handled by the rest of the code.
So in case of a partially written event group (transaction) the short (xid-less therefore) transaction is to be rolled back on the master.
So why won't we properly enclose the group in the binlog at the end of recovery which means to append ROLLBACK Query-log-event, naturally before that the last event of the group needs sanity check?
Was that that you had in mind actually?
What I had in mind was to not modify the crashed binlog file at all. The logic for detecting a not properly closed binlog group at the end of a crashed binlog should be more or less the same whether we do it during binlog recovery, or on the fly either on the master in the binlog send thread, or on the slave in IO thread (writing the relay log) or SQL thread (reading and executing events from relay log). And I think it's arguably more robust to do it on the slave. Generally, it is better to be robust against corrupt data when reading it. Here, the slave completely corrupts its internal state (binlog position) just because one event was missing in the binlog stream, that seems just a bug in the slave code, regardless of what caused this to happen in the first place. I do agree that appending to a crashed binlog is "less bad" than truncating it. - Kristian.