[Maria-developers] Conservative parallel slave is "too optimistic" for certain engines
Hello, This comes from MDEV-16242. == Symptoms == When one runs a parallel slave (mode=conservative) and replicates DML for MyRocks table without a Primary Key, replication may stop with a ER_KEY_NOT_FOUND error. This may happen even if the queries were run on the master sequentially. == A detail about conservative replication == Suppose the master runs these two statements, sequentially: INSERT INTO t1 VALUES (5),(6),(7); DELETE FROM t1; Parallel slave may schedule the INSERT to Thread1, and the DELETE to Thread2. In conservative parallel replication, the DELETE "will wait" to be executed after the INSERT. One may expect that "will wait" here means that DELETE execution does not start until the INSERT has committed. But it's more subtle than that: actually, DELETE execution will start as soon as the INSERT is ready to do the prepare/commit steps. I assume this was done to increase the parallelism: initial phases of DELETE can be ran in parallel with commit/prepare steps of the INSERT. This is safe to do: 1. The INSERT has acquired locks for all rows it is about to touch. DELETE will not be able to prevent INSERT from committing. 2. The DELETE starts its execution on a database that doesn't include the results of the INSERT (the INSERT has not committed yet!). But this is fine, because the DELETE locks the rows it is about to modify. If it attempts to access a row that the INSERT is about to insert, it will block on a lock until the INSERT finishes. == Applying this to MyRocks == The critical part is #2. It works if the DELETE command will acquire row locks for rows it about to DELETE. This is normally true: If the storage engine supports gap locks, attempting to read a locked gap will cause DELETE to wait for INSERT If the storage engine doesn't support gap locks, it will use Row-Based-Replication. RBR's Delete_rows_event includes Primary Key value of the row that should be deleted. Doing a point lookup on PK will cause the DELETE to block on the row lock. The remaining case is - storage engine that doesn't support gap locks - Row-based-replication of table without primary key In this case, finding the row to delete is done by scanning the table until we find the row where all columns match. The scan does not see the rows that the INSERT is about to INSERT, and DELETE stops with an error, "we could not find the row to delete". == Possible solutions == 1. Implement gap locking in MyRocks. This will take some time. 2. Change parallel slave to wait *for commit*. This should only be done if tables that are updated do not support Gap Locking. This is hard, it will require making risky changes on the SQL layer. 3. Disallow parallel slave execution for tables without Gap Locking and without primary key. (Looks most promising but I am not sure if it is sufficient). BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
Sergey Petrunia
== Symptoms == When one runs a parallel slave (mode=conservative) and replicates DML for MyRocks table without a Primary Key, replication may stop with a ER_KEY_NOT_FOUND error.
== A detail about conservative replication ==
The remaining case is - storage engine that doesn't support gap locks - Row-based-replication of table without primary key
In this case, finding the row to delete is done by scanning the table until we find the row where all columns match. The scan does not see the rows that the INSERT is about to INSERT, and DELETE stops with an error, "we could not find the row to delete".
Nice analysis. Normally it is guaranteed to be safe to start the next transaction in parallel with the commit step of a prior. But as you describe, RBR on table with no primary key is special, since it does not really correspond to any SQL-level operation.
2. Change parallel slave to wait *for commit*. This should only be done if tables that are updated do not support Gap Locking. This is hard, it will require making risky changes on the SQL layer.
I think you can just call thd->wait_for_prior_commit() in the code that does the RBR table scan on a table with no primary key. Eg. Rows_log_event::find_row(): if (m_key_info) { ... } else { DBUG_PRINT("info",("locating record using table scan (rnd_next)")); thd->wait_for_prior_commit(); ... } (Or do it only once, maybe in Rows_log_event::find_key() or similar, and only if rli->mi->parallel_mode==SLAVE_PARALLEL_CONSERVATIVE). This makes thread 2 wait for after thread 1 has completed commit, as needed. Similar code is found in temporary_tables.cc, since temporary tables also requires a full wait to be safe for parallel replication. This will theoretically penalise a bit RBR transactions on tables without primary key in conservative parallel replication. Well, RBR without primary key has other problems and is strongly discouraged; and I don't know how useful conservative mode is compared to optimistic.
3. Disallow parallel slave execution for tables without Gap Locking and without primary key. (Looks most promising but I am not sure if it is sufficient).
I don't think you need to completely disallow parallel replication, just adding the full wait at the appropriate place should be sufficient. Another option is to handle the error when it occurs by rolling back and retrying. This is how it is done in optimistic mode (and I would guess that this failure will not occur in optimistic mode). See eg. is_parallel_retry_error() and convert_kill_to_deadlock_error(). In optimistic parallel replication, this particular issue (a DELETE running ahead of an INSERT and getting ER_KEY_NOT_FOUND) is normal, and handled by rolling back the DELETE, doing wait_for_prior_commit(), and trying again. Conservative mode is already somewhat optimistic, since there are some corner cases in InnoDB that can cause conflicts even in conservative mode and require rollback+retry. So internally, "conservative" and "optimistic" are quite similar, and allowing ER_KEY_NOT_FOUND (or any normal error) to trigger a retry is probably an easy change in the code. Handling the error and retrying could be done without penalising any non-conflicting cases, though it is perhaps conceptually a somewhat larger change than just adding the extra wait in conservative RBR with no primary key.
I assume this was done to increase the parallelism: initial phases of DELETE can be ran in parallel with commit/prepare steps of the INSERT. This is safe to
In fact, all of the DELETE can potentially run in parallel with the commit step of the INSERT, and they can even group-commit together which is the important performance gain (but obviously not in cases like this when the transactions are conflicting). Hope this helps, - Kristian.
Kristian, Sergey, hello!
Sergey Petrunia
writes: == Symptoms == When one runs a parallel slave (mode=conservative) and replicates DML for
Nice analysis. Normally it is guaranteed to be safe to start the next transaction in parallel with the commit step of a prior. But as you describe, RBR on table with no primary key is special, since it does not really correspond to any SQL-level operation.
2. Change parallel slave to wait *for commit*. This should only be done if tables that are updated do not support Gap Locking. This is hard, it will require making risky changes on the SQL layer.
I think you can just call thd->wait_for_prior_commit() in the code that does the RBR table scan on a table with no primary key. Eg. Rows_log_event::find_row():
if (m_key_info) { ... } else { DBUG_PRINT("info",("locating record using table scan (rnd_next)")); thd->wait_for_prior_commit(); ... }
(Or do it only once, maybe in Rows_log_event::find_key() or similar, and only if rli->mi->parallel_mode==SLAVE_PARALLEL_CONSERVATIVE).
Right, without the if the OPTIMISTIC could not be called such :-). To me it's the best feasible solution, consistent with the name. Andrei
participants (3)
-
andrei.elkin@pp.inet.fi
-
Kristian Nielsen
-
Sergey Petrunia