Hi! First, very sorry that I was not able to answer this question until now. I have been traveling in Asia for 2 weeks with very limited time for internet + email. Now I am however back and hope to answer all possible questions you have on this topic.
"Pavel" == Pavel Ivanov <pivanof@google.com> writes:
Pavel> Hi, Pavel> I've discovered a change in MariaDB 10.0.8 that I don't quite Pavel> understand motivation for and which looks really surprising to me. So Pavel> I would appreciate if you could tell me what's the motivation and why Pavel> you think it's appropriate to do that. Pavel> The change in question is Pavel> http://bazaar.launchpad.net/~maria-captains/maria/10.0/revision/3961.1.7. Pavel> I see several issues with it: Pavel> 1. It introduces a pretty significant variable slave_ddl_exec_mode Pavel> which wasn't announced in the Release Notes. It looks like Daniel accidently forget to mention this in the release notes. I have now added thus to: https://mariadb.com/kb/en/mariadb-1008-release-notes/ I did however originally document this change in: https://mariadb.com/kb/en/create-table/ https://mariadb.com/kb/en/drop-table/ https://mariadb.com/kb/en/mysqld-options-full-list/ https://mariadb.com/kb/en/what-is-mariadb-100/ https://mariadb.com/kb/en/replication-and-binary-log-server-system-variables... and I wrote a very detail blog post why this change was needed: http://monty-says.blogspot.fi/2014/02/the-final-piece-of-puzzle.html Pavel> 2. The default variable value was set to IDEMPOTENT which changes Pavel> MariaDB slave behavior in an incompatible way. Not everyone will want Pavel> to have this variable set to IDEMPOTENT (I'd actually argue that no Pavel> one should set it to IDEMPOTENT), but without any mentioning of this Pavel> variable in the Release Notes people basically have the only chance to Pavel> get knowledge about this change (let me stress out: _incompatible_ Pavel> change) from production incidents or from accidental code inspection. Incompatible in which way? Can you please show me some common scenarions when the current default would not be what you want or expect? My argument is that most people would expect the following from the MariaDB replication: (Most of the following was not true before my patch and is not true for MySQL) - That row based, statement based and mixed mode replication would produce the same number of GTID. - One should be able to have different replication mode on master and slave and still get same number of GTID's. - Row based, statement based and mixed mode replication would produce the same data (was not true before my patch). - If a slave would die, it should be able restart without user intervention as long as the data on the slave and master will be exactly the same. - Statements should always be executed in the same order on the slave and master even if the slave and master used different storage engines (transactional or not). Yes, things are not 100 % compatible with how MySQL does things, but in almost all cases things are better than before. The slave will be able to handle restarts in a lot of common scenarios it could not handle before. Bcause of this I would argue that almost every single user would prefer the new mode compared to how things where before. Pavel> 3. When slave_ddl_exec_mode is set to IDEMPOTENT every "DROP TABLE" Pavel> event in master's binlog is converted into "DROP TABLE IF EXISTS" in Pavel> slave's binlog, which I believe is a major no-no for replication. Why ? In which scenario is this not desirable? Pavel> Binlogs on master and slave should be identical, otherwise after Pavel> failover even if one would set slave_ddl_exec_mode = STRICT it will Pavel> still execute "DROP TABLE IF EXISTS" because new master has that in Pavel> binlogs. The binlogs on master and slave have never been identical, especially if you are using different replication modes. Here is some strange usage: - For DROP of temporary tables IF EXISTS was added in some cases, in other cases the DROP was never written to the binary log. - CREATE IF NOT EXISTS .... LIKE other_table could create different tables based on replication mode. - CREATE ... SELECT was replicated with 1 GTID on statement based mode and 2 GTID's in row based mode. - MySQL 5.6 doesn't even allow one to use CREATE ... SELECT with GTID. The CREATE ... REPLACE patch fixes many cases that before causes data and tables to be different on master and slave. Here is some scenarios of things can go wrong and how the new mode will handle it better than the old one: 1) Repeatable DROP master: DROP TABLE t1,t2; old slave: DROP TABLE t1 crash; restart DROP TABLE t1 - Slave will stop as t1 doesn't exist anymore. New slave: DROP TABLE IF EXISTS t1 ; Drops t1 crash; restart DROP TABLE IF EXISTS t1 DROP TABLE IF EXISTS t2 End result, master and slave are consistent. 2) Repeatable CREATE master: CREATE TABLE t1 (a int); old slave: CREATE TABLE t1 (a int); - Crash before binary log is written ; restart CREATE TABLE t1 (a int); - Fail new slave: CREATE TABLE t1 (a int); - Crash before binary log is written ; restart CREATE OR REPLACE TABLE t1 (a int); This will work (or can easilbe be made to work) in most cases even if the slave fails between the .frm file being rewritten and the table is created in the storage engine. 3) Repatable CREATE ... SELECT master: CREATE TABLE t1 SELECT * from t2; old slave: CREATE TABLE t1 (a int); INSERT INTO t1 SELECT * from t2; - Crash during select ; restart Roll back insert (as part of recovery) CREATE TABLE t1 (a int); - Fail old slave: CREATE TABLE t1 (a int); INSERT INTO t1 SELECT * from t2; - Crash during select ; restart Roll back insert (as part of recovery) DROP TABLE t1; CREATE TABLE t1 (a int); INSERT INTO t1 SELECT * from t2; I don't know of any case where the new default mode would create and inconsistency between master and slave. Why do you think it's matter that DROP TABLE is replicated as DROP TABLE IF EXISTS ? When the slave sees the DROP TABLE, it knows that the table doesn't exists anymore on the master (it doesn't really matter if the table existed or not on the master, does it)? There are two cases how this can be run on the slave: - If the table existed, it will be deleted. - If the table doesn't exist, nothing will happen. In both scenarios the slave will be identical to the master (when it comes to this table). Pavel> 4. Why this change introduces OPTION_GTID_BEGIN in the code? Why Pavel> OPTION_BEGIN wasn't enough? Because the MySQL replication code doesn't honor OPTION_BEGIN. - It may do an implicit commit in many different scenarios. (On example is CREATE TABLE, but there are others) - It may reorder statements within a transaction (if some tables are transactional and others are not) - It allows us to have DDL's in a GTID transaction. When OPTION_GITD_BEGIN is set, all tables are treated as equal, in spite of table type, and no implicit commits are allowed. This help ensures that the slave and master's data will be equal after the statements where executed. (The original code didn't allow that). For example: master (where all tables are InnoDB): BEGIN INSERT into t1 values (1); INSERT into t1 select * from t2; INSERT into t2 values (2); COMMIT; slave (where t2 is MyISAM). The binlog will look like this: BEGIN INSERT into t2 values (2); INSERT into t1 select * from t2; INSERT into t1 values (1,2,3); COMMIT; This is because, by default, MySQL puts all statements that changes non transactional tables first in the binary log. Pavel> From reading the code the only thing I Pavel> understand is that with broken binlogs and sequence of events "GTID Pavel> BEGIN; CREATE TABLE; CREATE TABLE" MariaDB will try to execute both Pavel> CREATE TABLE statements in the same transaction without Pavel> auto-committing at the end. What do you mean with 'broken binlogs' ? How would you be able to create a broken binlog? If the master restarts, it will write a new 'log start' statement that will reset all 'BEGIN' flags, so there is no possible scenario, I know of, that would cause a binary log with two create statements after each other. If it's possible to get such a binary log, then you can also get similar problems with DML's. My patch had nothing to do with trying to work with broken binary logs. Pavel> This sounds pretty crazy to me. Trying to execute broken binary log's sounds very crazy to me too. By probably I don't understand what you really think is a problem. Lets look at this from a different angel: One part of the patch solves how CREATE ... SELECT is logged. Before it was logged in statement based replication: GTID BEGIN CREATE ... SELECT COMMIT and in row based replication as: GTID BEGIN CREATE COMMIT INSERT INTO ... SELECT COMMIT After my patch the second version is logged as: GTID BEGIN CREATE INSERT INTO ... SELECT COMMIT so we have the same number of GTID's in both replication modes. Is this more clear? Pavel> And given Pavel> the fact that in normal situation such sequence of events on the Pavel> master is not possible it seems that existence of both OPTION_BEGIN Pavel> and OPTION_GTID_BEGIN only introduces confusion (MariaDB have GTID Pavel> always on, right? So it should always execute GTID_BEGIN and never Pavel> BEGIN without GTID, right?). As I tried to explain above, OPTION_BEGIN doesn't make sense for a slave, that is why we need OPTION_GTID_BEGIN. And yes, a slave should never execute BEGIN without a GTID. Pavel> So what is your word on this? Can you at least set default value for Pavel> slave_ddl_exec_mode to STRICT so that slave behavior didn't change Pavel> with the version upgrade? I don't think that having a more stable slave is a behavior change. Sorry, but I really don't understand what concerns you have with the new behavior. Is it just that it was not clearly explained, or is there some scenario you are working with where it would be better to get the slave to fail than continue (even when we know that the data on master and slave will be identical). Please give me some examples of the scenarios that you are concerned about! Regards, Monty