Monty, can you please subscribe all your email address to maria-develoeprs@ ? People reading the list cannot see your emails. I'm forwarding here the unaltered message from Monty that I was answering to in my last email. On Sun, Nov 2, 2014 at 3:27 PM, Michael Widenius <monty@mariadb.org> wrote:
Hi!
I have again to apologize for a late reply. After my last reply to your topic I traveled to China and noticed that the firewall there has got much better; You can't access google, twitter, facebook, blogspot etc :( I tried to connect over a VPN, but that didn't work for me either.
It did feel a bit strange to not be able to do computer work for more than a week, but the lovely people in China who took care of me did compensate for that.
"Pavel" == Pavel Ivanov <pivanof@google.com> writes:
Pavel> Which key is "the first duplicate key".
The first key that has a duplicate value in the order of keys in the KEY structure that MariaDB provides for the storage engine.
Pavel> I'm definitely not that familiar with the relevant MariaDB code, but I Pavel> didn't find KEY structure to be passed to any insert-related function. Pavel> Can you elaborate (preferably with code references) how this Pavel> restriction is supposed to work?
The following code examples are from 10.0.
All handler functions get access to the TABLE object, which defines the record format and the keys.
The keys are stored in TABLE->key_info (see table.h, line 1062)
When one reads a key, one refers to keys according to the number in the key_info structure. For example:
A set of reads by key is done by calling:
handler->ha_index_init(key_number,..); read_until_no_more_rows handler->ha_index_read_map(). handler->ha->index_end()
When one calls handler->info() to ask for a which key caused a duplicate error, it's supposed to store the first duplicate key (key with smallest number) in handler->errkey.
This is the same order that keys are presented in SHOW CREATE TABLE
Pavel> Why is it safe to assume that keys have the same order on master and on slave?
Primary and Unique keys should always be ordered the same way on master and slave. This is handled by the 'sort_keys()' function that you can find at: sql/sql_table.cc, line 2770.
The only case when this is not the case, is if you add an unique key on the slave that was not on the master and there is conflicts when you insert things because of this key.
However, if you do this, I would say that you are on your own.
To help engine writes we are constantly adding new test in the storage engine test suite, to make it easier for storage engine vendors to test that their storage engine follows the interface. We have now added a test to check for the correct behaviour for the case of INSERT ... ON DUPLICATE KEY.
Pavel> This is very interesting. Can you say which test it is and what Pavel> exactly does it test? Is it present in 10.0.14 or it's only at the Pavel> latest 10.0 branch?
I added a test for this to: mysql-test/t/insert.test
--echo # --echo # MDEV-5168: Ensure that we can disable duplicate key warnings --echo # from INSERT IGNORE --echo #
Elena added another test for this in mysql-test/suite/storage_engine/insert_with_keys.test
See line 142, that start with: --let $create_definition = a $int_indexed_col UNIQUE KEY, b $int_indexed_col UNIQUE KEY, c $int_col
Do you have or use a storage engine that doesn't provide the expected value for the conflicting key ?
Pavel> I don't have a specific example of storage engine that detects Pavel> duplicate key in different order, but I'm not convinced and don't have Pavel> hard proof that such engine cannot exist and that InnoDB itself is not Pavel> susceptible to this problem. I can easily imagine though that even Pavel> given the particular order of keys in table definition the key Pavel> reported as having duplicate can depend on the way storage engine Pavel> stores those keys.
As key order is defined on the upper level, above InnoDB, this should be 100 % safe (at least for InnoDB).
For other store engines, this is safe for any engine that test for duplicate key in key order.
If we ever find a case where it's not safe with any storage engine that we deliver with MariaDB, we will fix the storage engine.
It's ok if they do thing differently so that they can detect duplicate key for other keys faster.
However, it has been required from all storage engines from start that if one calls 'handler->info()' to request which key was duplicated, they must return the first key.
If not, they are breaking the storage engine interface (both for MySQL and MariaDB) as there is no way the upper level can guarantee that things will work identically between storage engines.
Pavel> BTW, can you point to MySQL documentation and/or comment in the code Pavel> that indicates this requirement to storage engine implementation?
Most of the storage engine interface is not 100 % documented. This is one of the case. However, I wrote the code related to errkey and this is how it was supposed to work. In general if something is not documented, then it's up to the people that created the interface or are actively working on it to define how it should work.
What is important is that all storage engine should work the same way. If one returns something different compared to any of the existing engines then it's a bug.
In this case the implicit documentation is how MyISAM, Aria and InnoDB works. All of them returns the first conflicting key.
Pavel> On the related note Release Notes mention the error log flood Pavel> protection. Why it's not behind flag? How should I disable the feature Pavel> and get back to the old behavior?
We already had in MariaDB flood protection for some error messages.
Pavel> Can you say exactly which error messages are suppressed already? Is Pavel> there a documentation link about that?
All 'unsafe logging' errors are suppressed. See enum enum_binlog_stmt_unsafe in sql_lex.h (This is what I know).
I am happy to look into adding a flag that is usable for any kind of suppressed messages.
A couple of ways to do this:
a) Not suppress anything if you give --log-warnings a value > 2 b) make LIMIT_UNSAFE_WARNING_ACTIVATION_TIMEOUT a configurable variable, where 0 would mean no suppression.
Which of the above, or both, would you prefer? Or do you have any other suggestion of how to do this?
Pavel> Reading documentation on --log-warnings I don't see how it can be Pavel> related to the error suppression. So I think making Pavel> LIMIT_UNSAFE_WARNING_ACTIVATION_TIMEOUT configurable would be much Pavel> better, though it should apply to all possible error suppressions.
Currently we only have suppression for a limited set of 'unsafe' errors, but I agree that when we add a flag we should name this with a general name to be able to reuse it for all possible future suppressions.
In 10.1 we are looking at adding a feature that one can suppress any error from the error log, but that's a different topic.
Regards, Monty