Kyle Kingsbury via discuss <discuss@lists.mariadb.org> writes:
I've just completed an analysis of isolation levels in MySQL. The report focuses on MySQL, but the test suite runs against MariaDB as well. MariaDB exhibits all the same behaviors--"Repeatable Read" allows a whole bunch of anomalies that should be prohibited under Repeatable Read, including G2-item, G-single (even without predicates), non-repeatable reads, violations of Monotonic Atomic View, etc. Figured this might be of interest to the list.
https://jepsen.io/analyses/mysql-8.0.34
I'm also curious whether this is widely-understood behavior in the MariaDB community, or if it's news to most people. The extant
From a quick read of the link, the main point seems to be about what happens to INSERT/UPDATE/DELETE ("updates") in a REPEATABLE READ transaction. As is mentioned, the basic idea in InnoDB REPEATABLE READ is to use a logical snapshot at the time of transaction start to satisfy SELECTs. I think it was always obvious to me that this cannot work for updates. Updates have to modify the current version of the database, not some old version. So I have always thought of REPEATABLE READ to be "repeatable" for read-only transactions spanning multiple selects. The primary purpose of the way locking is done in InnoDB is to support statement-based asynchronous replication. The idea is that taking the update statements and re-running them transaction by transaction, in commit order, will produce identical data. Similar to SERIALIZABLE, but for the INSERT/UPDATE/DELETE statements, not necessarily the SELECTs. The ability to replicate using statements is very important for performance. Queries that modify a large number of rows otherwise have to write huge amounts of replication data ("binlogs" in MySQL/MariaDB terms) recording the contents of every single changed database row. As is also remarked in the link, using statements for replication is also very complicated to do correctly in all corner cases, and in practise quite fragile, unfortunately. In the end, it is an evolutionary compromise between getting flexibility and performance on the one hand, and sufficiently correct behaviour on the other. My personal opinion is that the success of the database is in large part due to the constant focus on ensuring the flexibility and performance needed by practical use. But the resulting complexity and fragility of replication is a very real problem for users, and needs improvement. So this kind of deep analysis that you have done is very valuable I think, to understand the current situation and point out current weaknesses. As to concrete suggestions, getting documented what REPEATABLE READ actually does would be very valuable I think. As to the SQL standard, I am not intimately familiar with it, but the inadequacy of the isolation levels as defined by the standard has been "folklore" in the community for decades. Unlike something like the C/C++ standard, the SQL standard seems to be far removed from the needs of users, at least in the applications that typically employ MySQL/MariaDB. And from my experience, people are resigned rather than hopeful that this could ever change. This is sad, as a good standard could help improve the portability of applications between different SQL implementations. - Kristian.