Pavel> Which key is "the first duplicate key".
The first key that has a duplicate value in the order of keys in the KEY structure that MariaDB provides for the storage engine.
I'm definitely not that familiar with the relevant MariaDB code, but I didn't find KEY structure to be passed to any insert-related function. Can you elaborate (preferably with code references) how this restriction is supposed to work?
This is the same order that keys are presented in SHOW CREATE TABLE
Why is it safe to assume that keys have the same order on master and on slave?
To help engine writes we are constantly adding new test in the storage engine test suite, to make it easier for storage engine vendors to test that their storage engine follows the interface. We have now added a test to check for the correct behaviour for the case of INSERT ... ON DUPLICATE KEY.
This is very interesting. Can you say which test it is and what exactly does it test? Is it present in 10.0.14 or it's only at the latest 10.0 branch?
Do you have or use a storage engine that doesn't provide the expected value for the conflicting key ?
I don't have a specific example of storage engine that detects duplicate key in different order, but I'm not convinced and don't have hard proof that such engine cannot exist and that InnoDB itself is not susceptible to this problem. I can easily imagine though that even given the particular order of keys in table definition the key reported as having duplicate can depend on the way storage engine stores those keys. BTW, can you point to MySQL documentation and/or comment in the code that indicates this requirement to storage engine implementation?
Pavel> On the related note Release Notes mention the error log flood Pavel> protection. Why it's not behind flag? How should I disable the feature Pavel> and get back to the old behavior?
We already had in MariaDB flood protection for some error messages.
Can you say exactly which error messages are suppressed already? Is there a documentation link about that?
I am happy to look into adding a flag that is usable for any kind of suppressed messages.
A couple of ways to do this:
a) Not suppress anything if you give --log-warnings a value > 2 b) make LIMIT_UNSAFE_WARNING_ACTIVATION_TIMEOUT a configurable variable, where 0 would mean no suppression.
Which of the above, or both, would you prefer? Or do you have any other suggestion of how to do this?
Reading documentation on --log-warnings I don't see how it can be related to the error suppression. So I think making LIMIT_UNSAFE_WARNING_ACTIVATION_TIMEOUT configurable would be much better, though it should apply to all possible error suppressions. Thank you, Pavel On Fri, Oct 17, 2014 at 1:43 PM, Michael Widenius <michael.widenius@gmail.com> wrote:
Hi!
First sorry for the delayed reply; I am at a conference in Oxford and didn't have an reliable internet connection until now.
"Pavel" == Pavel Ivanov <pivanof@google.com> writes:
Pavel> Hi, Pavel> I see that in MariaDB 10.0.14 Michael disabled replication safety Pavel> warning for the case of INSERT ON DUPLICATE KEY UPDATE with several Pavel> unique keys with a comment "We should assume that the store engine Pavel> will report the first duplicate key for this case." Pavel> http://bazaar.launchpad.net/~maria-captains/maria/10.0/revision/4400.1.3
Pavel> What does that mean?
That we don't anymore give a warning for things that can't happen in the distributed MariaDB server.
Pabel> Which key is "the first duplicate key".
The first key that has a duplicate value in the order of keys in the KEY structure that MariaDB provides for the storage engine.
This is the same order that keys are presented in SHOW CREATE TABLE
Pavel> and why Pavel> MariaDB relies on an engine behavior which AFAIK is not guaranteed?
This is guaranteed for all storage engines shipped with MariaDB and should also be guaranteed for any storage engine that wants to follow the MariaDB and even the original MySQL storage engine interface.
The change is also in line with how I designed the part of the storage engine interface that gives the SQL layer the information about which key that was conflicted. As we use this for things like INSERT ... ON DUPLICATE KEY, it's critical that all storage engine provides the same information in case of conflicts.
To help engine writes we are constantly adding new test in the storage engine test suite, to make it easier for storage engine vendors to test that their storage engine follows the interface. We have now added a test to check for the correct behaviour for the case of INSERT ... ON DUPLICATE KEY.
What is important when it comes to the storage engine interface is that in all possible cases, a query should work identically against 2 different storage engines.
The only 'allowed difference's that I can think of for the moment are:
- Different storage engines may retrieve rows in different order. - In case of errors, changes to non transactional tables will not roll back. - Automatically generated key values may be different between storage engines. (but preferably they should not be that). However when things are replicated the slave, it should get the same values as the master.
In the case of INSERT ... ON DUPLICATE KEY, it was clear that someone had broke the storage engine interface, for no good reason. Instead of users having to depend on 'unsafe, different and undocumented behavior' for which row as updated, it's better to fix this issue once and for all, especially as fixing this is trivial.
Do you have or use a storage engine that doesn't provide the expected value for the conflicting key ?
If yes, there is a couple of ways to fix this. I am happy to talk with you to find a solution that will satisfy your needs and ours.
I underatand that in many cases it's not critical which unique key the storage engine reports as duplicate. However in some cases, it's critical to get the first key.
I could add a new extra call to inform the engine that we now require the first duplicate key and not just any conflicting key. This would be used for example for INSERT ... ON DUPLICATE KEY ...
I would prefer to have the check for which key was duplicated in the storage engine. However, if critically needed, and assuming there is more than one storage engine that needs this, we could add a check on the handler level that will do a lookup on all unique keys to find the first duplicate one.
What solution would you prefer?
Pavel> And why this serious change in behavior wasn't mentioned in the Pavel> Release Notes?
I didn't see it as a a serious change as it's only about removing a warning that has no value for any normal MariaDB user (who only use storage engines supplied with MariadB) and which actually caused serious problems for some MariaDB users.
That said, I agree it was a mistake from my part to not notice that this was missing in the release notes. I will try to do better in the future and check that the release notes correctly reflects any changes that I have done.
I appologize for missing this in this case.
Pavel> On the related note Release Notes mention the error log flood Pavel> protection. Why it's not behind flag? How should I disable the feature Pavel> and get back to the old behavior?
We already had in MariaDB flood protection for some error messages. What I did was to make it more general and extend it to also include other error messages, like the one we are discussing. As there wasn't a flag for the old suppression, I didn't see a need for adding a flag for the new suppression.
I am happy to look into adding a flag that is usable for any kind of suppressed messages.
A couple of ways to do this:
a) Not suppress anything if you give --log-warnings a value > 2 b) make LIMIT_UNSAFE_WARNING_ACTIVATION_TIMEOUT a configurable variable, where 0 would mean no suppression.
Which of the above, or both, would you prefer? Or do you have any other suggestion of how to do this?
For 10.1 we will probably add a mechanism where you can permanently suppress any specific error message to be written to the error log. (They would still be given to the client).
Any suggestion or wishes regarding this?
Regards, Monty