[Maria-developers] TRUE LOCK=NONE for slave
Hello everyone! I am thinking of implementing true lock=none for slaves. Currently Alter command is only executed on slave when master commits it. I(thanks marko for suggesting this idea) am thinking of writing BEGIN_DDL_EVENT in binlog after master hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT. On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On receiving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute until ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are available then event will be en-queued, then If we received rollback event the we will simply discard event from queue, If we received commit event then SQL thread will synchronously process DDL event. Please let me know what you think. Credits for idea Marko and Lixun. Regards sachin
Hi, MDEV ticket for this task https://jira.mariadb.org/browse/MDEV-11675 Regards sachin On Thu, Dec 29, 2016 at 3:08 PM, Sachin Setiya <sachin.setiya@mariadb.com> wrote:
Hello everyone!
I am thinking of implementing true lock=none for slaves. Currently Alter command is only executed on slave when master commits it. I(thanks marko for suggesting this idea) am thinking of writing BEGIN_DDL_EVENT in binlog after master hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT.
On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On receiving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute until ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are available then event will be en-queued, then If we received rollback event the we will simply discard event from queue, If we received commit event then SQL thread will synchronously process DDL event.
Please let me know what you think. Credits for idea Marko and Lixun.
Regards sachin
Sachin Setiya <sachin.setiya@mariadb.com> writes:
I am thinking of implementing true lock=none for slaves. Currently
Can you explain what you mean by "true LOCK=NONE"? It is not clear from your description. I think you mean something that will allow to run an ALTER TABLE on the slave in parallel with the same statement on the master?
Alter command is only executed on slave when master commits it. I(thanks marko for suggesting this idea) am thinking of writing BEGIN_DDL_EVENT in binlog after master hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT.
There will be a number of things to consider with this. Previously, all transactions in the binlog have been atomic; this introduces related events that can be arbitrarily separated in the binlog. For example, suppose the slave stops in the middle between BEGIN_DDL_EVENT and COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT. What is then the slave's binlog position / GTID position? You should probably make a list of all relevant issues, along with possible solutions.
On slave there will be pool of threads(configurable global variable), which
Hopefully, the exiting pool of slave worker threads (currently used for parallel replication) can be used for this as well? Doesn't seem right to introduce a new kind of thread pool...
will apply these DDLs. On receiving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute until ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread.
Won't you need some mechanism to prevent following events to access the table before ALTER TABLE in a worker thread can acquire the metadata lock? There are some nasty issues to be aware of with potential deadlocks related to FLUSH TABLES WITH GLOBAL READ LOCK and such, when multiple threads are trying to coordinate waiting and locking; there were already some quite hard issues with this for parallel replication.
If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are available then event will be en-queued, then If we received rollback event the we will simply discard event from queue, If we received commit event then SQL thread will synchronously process DDL event.
Another mechanism that currently exists for improving replication speed of ALTER TABLE (and other long-running queries) is out-of-order parallel replication, using gtid_domain_id. This allows a DBA to run the ALTER TABLE in a separate GTID domain, which will allow the ALTER to run on the slave in parallel with any other queries. But unlike your suggestion, it does not allow the ALTER on the slave to run in parallel with the one on the master, and it is not fully automatic; the DBA needs to SET SESSION gtid_domain_id and ensure that no conflicting transactions are run for the duration of the ALTER. Hope this helps, - Kristian.
Hi,
On 29 Dec 2016, at 10:38, Sachin Setiya <sachin.setiya@mariadb.com> wrote:
I am thinking of implementing true lock=none for slaves. Currently Alter command is only executed on slave when master commits it. I(thanks marko for suggesting this idea) am thinking of writing BEGIN_DDL_EVENT in binlog after master hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT.
On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On receiving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute until ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are available then event will be en-queued, then If we received rollback event the we will simply discard event from queue, If we received commit event then SQL thread will synchronously process DDL event.
Please let me know what you think. Credits for idea Marko and Lixun.
This idea has been discussed several times in the past (a way to trigger the action to be kicked off on all the slaves while it is running on the master still) with different people at MariaDB and elsewhere and it sounds very attractive. It’s currently one of the reasons that INLINE ALTER TABLE is rarely done where I work as it generates a replication lag on slaves so is not acceptable and running the task in isolation separately on each server in a replication chain requires careful management. From the operational perspective a couple of questions come up: * this behaviour should be configurable? - new global variable on the master to allow injection of this changed event stream? - a new session variable set in the session where the command is triggered ? - on slave a setting to indicate how many INPLACE ALTER TABLEs can run at once? * how does a DBA monitor what’s going on? - the progress and number of active DDL statements - please consider adding counters/metrics for: * number of “asynchronous DDLs” in progress / completed successfully / failed or rolled back * sizes of the DDL changes made so far * number of threads busy in this state (maybe can be implied from SHOW PROCESSLIST or equivalent but nicer to have an easy to query metric) * what happens if the commit/rollback ddl event never arrives? (the statement on the master could be skipped for one of several reasons) - I assume that users on the slave see the old structure all the time until it completes - would this have a storage or performance penalty on the slave if the commit/rollback DDL event never arrives? - can a “commit” / “rollback” be manually triggered by the DBA in such circumstances? * what happens if the server crashes while this process (or these processes) are ongoing when the server starts up again? Thanks for clarifying ideas on this as while the idea sounds very attractive it helps enormously if monitoring of status is clear and easy to achieve. Regards, Simon
participants (3)
-
Kristian Nielsen
-
Sachin Setiya
-
Simon J Mudd