
Andrei Elkin <andrei.elkin@mariadb.com> writes:
This is not the way to communicate back from engine to server. Instead, simply return a different error code than HA_ERR_LOCK_WAIT_TIMEOUT from InnoDB that you can check for here. You can maybe use one of the existing HA_ERR_* from include/my_base.h, or add a new one if necessary.
However, adding a new error is clearly more intrusive.
The new error is completely handled inside the code you add for signalling the condition inside InnoDB locking code and handling it in replication row event apply. It will not be visible to users. So it is not intrusive to add a new one. Just be careful to not pick a number that conflicts with another entry in a later branch, if you do this in a GA release. Alternatively, you can use the existing HA_ERR_ROW_NOT_VISIBLE error, which is used in a somewhat similar way in storage/maria/ma_blockrec.c and nowhere else. But I would just add a new error for clarity.
And at the same time I thought a some sort of runtime state the slave applier had always missed. So the intent of this struct is future-minded as well.
This is not the point. Even if the rgi->exec_flags already existed, it would still be wrong to use it for this. Here is the code where you are handling the condition:
+ if (error == HA_ERR_LOCK_WAIT_TIMEOUT && !is_index_unique && + (rgi->exec_flags & (1 << rpl_group_info::HIT_BUSY_INDEX))) + { + rgi->exec_flags &= ~(1 << rpl_group_info::HIT_BUSY_INDEX); + skip_first_compare= true;
So you _already_ are using a specific error code to signal the condition from InnoDB to replication, the error HA_ERR_LOCK_WAIT_TIMEOUT. But this error can also be thrown in other situations, so you need to add these extra conditions to distinguish. So essentially, you are using the wrong error code to signal the condition. Use a correct error code that uniquely identifies the condition that occured, and then you can avoid these extra conditions and simplify the code.
I would be fine with either method.
Good, then please change it to return a proper error code. The only reason that you are using HA_ERR_LOCK_WAIT_TIMEOUT is that then you could do the quick change inside InnoDB to set its timeout value to 0, and this happens to then propagate the HA_ERR_LOCK_WAIT_TIMEOUT up to the callers without waiting. But that's not an appropriate way to design new code and APIs. Yes, it is frustrating when conceptually simple changes need extra effort, due to the need to work in an existing old and complex code base like MariaDB. But that is just the way it is, it is the same for all of us. - Kristian.