And here's response from Monty to my last email. And this response together with the whole thread makes me really worried. It alludes to the fact that all MySQL and MariaDB developers before making any changes should read the whole codebase first and then make sure that their changes work the same way and that they conform to the intentions of original code developers even though there's no way they can talk to those original developers. AFAIK that's not how development on big projects with hundreds of participants works. And it really raises the question for me: what kind of quality the product that is developed with such principles in mind can have? Sergei Golubchik, I believe you are the technical leader of MariaDB project. Could you please tell me what you think about this issue? On Tue, Nov 4, 2014 at 6:32 AM, Michael Widenius <michael.widenius@gmail.com> wrote:
Hi!
"Pavel" == Pavel Ivanov <pivanof@google.com> writes:
Pavel> On Sun, Nov 2, 2014 at 3:27 PM, Michael Widenius <monty@mariadb.org> wrote:
This is the same order that keys are presented in SHOW CREATE TABLE
Pavel> Why is it safe to assume that keys have the same order on master and on slave?
Primary and Unique keys should always be ordered the same way on master and slave. This is handled by the 'sort_keys()' function that you can find at: sql/sql_table.cc, line 2770.
Pavel> This function seem to sort only by "type" of the key. If several Pavel> unique keys have the same "type" according to its classification then Pavel> they will be in the "original order" as the comment in the function Pavel> says. And the original order may be different.
This function ensures that the table will be identical on slave and master as long as one of the following requirements are fulfilled:
- You are using the same create table and alter table statements on both machines. - You are using the same CREATE TABLE statement as 'show create' shows on the master. - You are adding unique keys in the same order on master and slave.
This is always the case when you are using the replication that MySQL and MariaDB uses.
If you are creating tables differently on the slave than on the master, then you are on your own.
So in all normal and most abnormal cases, the table will be created identically on the master and slave, even if the slave is using a different storage engine, have different columns or have additional keys than on the master.
<cut>
I added a test for this to: mysql-test/t/insert.test
--echo # --echo # MDEV-5168: Ensure that we can disable duplicate key warnings --echo # from INSERT IGNORE --echo #
Pavel> This test doesn't check which duplicate key is asserted to execute ON Pavel> DUPLICATE KEY action.
This test verifies that we don't get warnings for the tests. Elinas test verifies which key is used.
Elena added another test for this in mysql-test/suite/storage_engine/insert_with_keys.test
See line 142, that start with: --let $create_definition = a $int_indexed_col UNIQUE KEY, b $int_indexed_col UNIQUE KEY, c $int_col
Pavel> This test seem to check that UPDATE is executed on the correct Pavel> duplicate key, but it's not executed during regular mtr run. How and Pavel> when it's executed in MariaDB?
The storage engine tests are to be used by people that implements their own storage engine for MariaDB.
We run the extended tests from time to time to verify storage engines. We will also run it on some buildbot servers when running releae builds. The storage engine tests is not something that has to be run for normal builds.
What is important is that all storage engine should work the same way. If one returns something different compared to any of the existing engines then it's a bug.
In this case the implicit documentation is how MyISAM, Aria and InnoDB works. All of them returns the first conflicting key.
Pavel> I'm sorry to say, but you've explained exactly the reason why this Pavel> behavior is not guaranteed and can be changed at any time.
It can only be changed if someone adds new code that breaks the storage engine interface, as defined by existing engines and one doesn't check it with the storage engine test suite.
Pavel> Yes, I Pavel> understand that when you wrote the interface you had some particular Pavel> requirements in mind. But ultimately it's up to people actively Pavel> working on the code (who I believe are counted by hundreds now) to Pavel> decide whether to maintain the particular behavior or change it.
It's the MariaDB captains that are responsible to maintain the interface. The only way one can change something is to propose a change and get it accepted.
It's not free for anyone to just change old functionality and expect that to be accepted in the standard version.
In all cases, including this, it's an absolute requirement that engines MUST work exactly as existing engines, so that a user can switch engines and expect them to work identically.
(The only exceptions are things that are impossible to fix, like the differences between transactional and not transactional tables and that there is no defined row order in tables).
If engine writes don't ensure this, then you will get data corruption on the server by just running with different engines.
For case on insert on duplicate key, there is no acceptable reason for engines to work differently.
Pavel> And Pavel> as nobody knows the original requirements.
The original requirements was well defined. Just that non one bother to ask doesn't mean that the requirement didn't exist.
And the fact that all existing engines in MariaDB ensures this shows that the requirements has been understood.
In the case when we notice that two engines works differently, it's up to the MariaDB captains to fix this! The fix is NOT to extend the API and allow things to work differently.
Pavel> and the reasons why they Pavel> can't be changed, and when the behavior is not tested anywhere
We add new tests when we notice that they are needed, as we did in this case. For example, the case with INSERT ... ON DUPLICATE KEY is now tested.
Pavel> and Pavel> doesn't have any documentation even in the weakest form of comments or Pavel> asserts, the decision is usually very easy -- change at will, nobody Pavel> should ever depend on implementation details.
Yes, no one should depend in implementation details, except when the implementation detail is different from other engines and it results in different data.
If that is the case then the engine that does things differently than the existing engines must be changed.
Pavel> Even if you promise to Pavel> fix storage engines that change this behavior, by the time you notice Pavel> the change you'll likely won't be able to fix it (as e.g. in case of Pavel> https://mariadb.atlassian.net/browse/MDEV-6916) or someone will Pavel> already experience a data corruption.
In most cases we can fix this. What was unfortunate with the views code was that we didn't store the MariaDB version number in the view definition. This is to be fixed in 10.1 so that this will never happen again!
Pavel> I'd strongly recommend to revert this change.
Sorry, but I can't agree to revert the change because:
a) Something that can't ever happen in MariaDB (when used correctly with the including engines). b) Allowing a behavior that could for the same command result in different data sets, depending on the storage engine. c) I have not seen a real world
One of my main jobs is to ensure that the storage engine API is evolved in a resonable manner and if there is a conflict between how storage engine works ensure that things are resolved so that the user can expect the same results independent on the storage engine.
Reverting the change would go directly agains the above and also against what I belive is the future of the storage engine API.
Regards, Monty