Kristian, Thanks for your prompt reply! On Wed, Dec 18, 2019 at 08:38:31AM +0100, Kristian Nielsen wrote:
Sergey Vojtovich <notifications@github.com> writes:
ATTN @dr-m, @andrelkin, @SachinSetiya, @knielsen
So IIUC, this is about incorrect usage of ha_truncate() in rpl_slave_state::truncate_state_table().
This is used only for
SET GLOBAL gtid_slave_pos = "..."
when all slave threads are stopped and nothing else is accessing the gtid_pos table. It is a table, so any client connection can be accessing it any time?
So it's fine to use ha_truncate() if that can be done easily (and correctly). But it would also be fine just to loop and delete all rows one by one in a normal transaction, if that is simpler. gtid_slave_pos is a small table, there are normally only a few rows per active replication domain.
It is good to have this alternative with less strict locking. I'll leave it up to Andrei to decide if he wants to implement it.
I'm not myself very familiar with details of metadata locking etc. around ha_truncate().
I think original code worked well initially. Then we got some InnoDB improvement, which made original code not valid.
But looking at the code now, I don't understand why it only truncates one table? If --gtid-pos-auto-engines is in effect, there could be multiple tables... shouldn't they all be cleared when setting the gtid_slave_pos variable? If so, maybe the delete-rows-one-by-one approach is in any case preferable over ha_truncate, since it can then be done transactionally, to not leave an inconsistent gtid_slave_pos state if one truncate fails and another succeeds?
Andrei? Thanks, Sergey