[Maria-developers] RFC: new replication feature "per-engine mysql.gtid_slave pos"
I plan to implement MDEV-12179, per-engine mysql.gtid_slave_pos. Here is a description of the high-level design, as a request for comments and/or suggestion for changes. The purpose of this is to fix a serious performance issue in replication when multiple storage engines are used. Every replicated transaction updates the replication GTID position in table mysql.gtid_slave_pos. There is a large overhead if that table is using another storage engine than the rest of the transaction, as then a cross-engine XA transaction needs to be done. But if more than one storage engine is used on a server, some replicated transactions will necessarily end up being cross-engine. This problem becomes more interesting to fix with the new interest in MyRocks. Though it already exists also for eg. people using MyISAM or TokuDB simultaneously with InnoDB, for example. The basic idea is to create multiple copies of the table, eg. mysql.gtid_slave_pos_innodb and mysql.gtid_slave_pos_myrocks. This way, an InnoDB transaction can update the InnoDB version of the table, and similar for other engines. To enable the feature, the DBA would create extra copies of the table for the engines she wants: CREATE TABLE mysql.gtid_slave_pos_myrocks LIKE mysql.gtid_slave_pos; ALTER TABLE mysql.gtid_slave_pos_myrocks ENGINE=myrocks; The server will look for all tables called mysql.gtid_slave_pos* at startup, as well as on each START SLAVE. This method makes migration from earlier versions simple. MariaDB already has the ability to work with multiple redundant rows in the mysql.gtid_slave_pos table. The column "sub_id" provides a version to identify the most recent entry. So the server just needs to read multiple tables (if present), and pick the newest version amongst all rows. Thus, the new feature is compatible with older versions without any schema changes. I think it is better than automatically creating copies of the table for all engines; it is often the case that an engine (eg. TokuDB) is available in the server binary, but the user does not want any tables created in that engine. This means that no speedup is obtained if the user does not create additional mysql.gtid_slave_pos* tables. (However, things should still work fine, thanks to XA synchronising cross-engine transactions). - Kristian.
Hi, On Mon, Mar 6, 2017 at 5:11 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
To enable the feature, the DBA would create extra copies of the table for the engines she wants:
CREATE TABLE mysql.gtid_slave_pos_myrocks LIKE mysql.gtid_slave_pos; ALTER TABLE mysql.gtid_slave_pos_myrocks ENGINE=myrocks;
What about a configuration setting in my.cnf that will do the above magically? I would prefer not making direct changes to the mysql database. Thanks, -will -- Will Fong, Senior Support Engineer MariaDB Corporation
Will Fong <will.fong@mariadb.com> writes:
What about a configuration setting in my.cnf that will do the above magically?
I would prefer not making direct changes to the mysql database.
I agree that this is not ideal. But do you have a suggestion for how the semantics of such an option should be? Too much magic is not good either. For example, if the user does a blackhole engine transaction, do we really want a blackhole mysql.gtid_slave_pos? Or archive engine, which does not support deletes? One thing that could be done is to supply a standard stored procedure to add an engine table, to avoid direct changes to the mysql database: mysql.gtid_pos_add_engine("MyRocks"); This would create the table, and could also check eg. that a table does not already exist etc. This seems better. But even better woulf be if most users did not have to do anything to get the improved performance. Maybe we could just have a hardcoded list of engines that are suitable for mysql.gtid_slave_pos (eg. innodb, tokudb, myrocks). And replication will automatically create a suitable mysql.gtid_slave_pos_XXX if it sees a transaction in one of those engines, unless --skip-gtid-auto-create-pos-table is set. For more exotic/experimental engine, the user can manually call mysql.gtid_pos_add_engine() if desired. So to summarise - Auto create mysql.gtid_slave_pos_{innodb,tokudb,myrocks} if a transaction is seen in one of those engines - A --skip-gtid-auto-create-pos-table to disable this auto behaviour - A stored procedure mysql.gtid_pos_add_engine() to facilitate manually adding an engine table. What do you think, does this sound ok? Suggestions for better ways? And thanks for commenting, this helped me rethink this part of the design which I was not very happy with. - Kristian.
Hi Kristian! Le 08/03/2017 à 09:58, Kristian Nielsen a écrit :
Will Fong <will.fong@mariadb.com> writes:
But even better woulf be if most users did not have to do anything to get the improved performance. Maybe we could just have a hardcoded list of engines that are suitable for mysql.gtid_slave_pos (eg. innodb, tokudb, myrocks). And replication will automatically create a suitable mysql.gtid_slave_pos_XXX if it sees a transaction in one of those engines, unless --skip-gtid-auto-create-pos-table is set. For more exotic/experimental engine, the user can manually call mysql.gtid_pos_add_engine() if desired. Why not using the XA support flag of the engine to check if we should create the gtid_slave_pos_{engine} file ?
Jocelyn
jocelyn fournier <jocelyn.fournier@gmail.com> writes:
Why not using the XA support flag of the engine to check if we should create the gtid_slave_pos_{engine} file ?
The only additional engine currently that supports XA is Spider, if my grep skills are not failing me. But Spider does not seem appropriate for automatically creating mysql.gtid_slave_pos_spider. I am not even sure it is possible to create a Spider table without specifying some remote server to connect to. In general, I am fine with choosing auto-creating engines based on some generic criteria, if that is what people wants. It just seems to me better to write in the documentation an explicit list of engines that users can expect to have automatically created tables for, to try to avoid unpleasant surprises. Like this one with the Spider engine. So adding an engine that automatically creates a table requires a conscious decision. Of course, normally we try to avoid hardcoded lists of specific engines. But we do have it in some places. For example, there is a list of engines that are built-in to the server by default. Maybe auto-creating mysql.gtid_slave_pos could be considered similar. But let me know what you think. - Kristian.
Hi Kristian! On Wed, Mar 8, 2017 at 4:58 PM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
One thing that could be done is to supply a standard stored procedure to add an engine table, to avoid direct changes to the mysql database:
mysql.gtid_pos_add_engine("MyRocks");
My concern is about configuration management and automated deployments. Having an extra "query" step is not so ideal...
So to summarise
- Auto create mysql.gtid_slave_pos_{innodb,tokudb,myrocks} if a transaction is seen in one of those engines
- A --skip-gtid-auto-create-pos-table to disable this auto behaviour
- A stored procedure mysql.gtid_pos_add_engine() to facilitate manually adding an engine table.
What do you think, does this sound ok? Suggestions for better ways?
I like this idea very much!!! Thanks, -will -- Will Fong, Senior Support Engineer MariaDB Corporation
Hi all, On 9 March 2017 at 04:47, Will Fong <will.fong@mariadb.com> wrote:
So to summarise
- Auto create mysql.gtid_slave_pos_{innodb,tokudb,myrocks} if a
On Wed, Mar 8, 2017 at 4:58 PM, Kristian Nielsen transaction
is seen in one of those engines
- A --skip-gtid-auto-create-pos-table to disable this auto behaviour
- A stored procedure mysql.gtid_pos_add_engine() to facilitate manually adding an engine table.
What do you think, does this sound ok? Suggestions for better ways?
I like this idea very much!!!
I also like it very much. Let me add my 2 cents: I think skip-gtid-auto-create-pos-table should be enabled (skipping creation) by default in the 1st release of the new "per-engine mysql.gtid_slave_pos" feature. This respects the "principle of least surprise" when upgrading to a new version and could be changed in a next major version. To complete that, maybe adding some counters (statues) about: - number of transactions not spawning storage engine (could benefit the new feature), - number of cross storage engine transactions (needed XA, but exclusing mysql.gtid_slave_pos*) - number of transactions not spawning storage engine that updated a mysql.gtid_slave_pos in another SE (because skip-gtid-auto-create-pos-table=1) - maybe more... Many thanks, Jean-François Gagné System Engineer @ Booking.com
how about --gtid_auto_create_engine_list= default value = innodb,tokudb (and the stored proc) /Jonas On Wed, Mar 8, 2017 at 9:58 AM, Kristian Nielsen <knielsen@knielsen-hq.org> wrote:
Will Fong <will.fong@mariadb.com> writes:
What about a configuration setting in my.cnf that will do the above magically?
I would prefer not making direct changes to the mysql database.
I agree that this is not ideal.
But do you have a suggestion for how the semantics of such an option should be? Too much magic is not good either.
For example, if the user does a blackhole engine transaction, do we really want a blackhole mysql.gtid_slave_pos? Or archive engine, which does not support deletes?
One thing that could be done is to supply a standard stored procedure to add an engine table, to avoid direct changes to the mysql database:
mysql.gtid_pos_add_engine("MyRocks");
This would create the table, and could also check eg. that a table does not already exist etc. This seems better.
But even better woulf be if most users did not have to do anything to get the improved performance. Maybe we could just have a hardcoded list of engines that are suitable for mysql.gtid_slave_pos (eg. innodb, tokudb, myrocks). And replication will automatically create a suitable mysql.gtid_slave_pos_XXX if it sees a transaction in one of those engines, unless --skip-gtid-auto-create-pos-table is set. For more exotic/experimental engine, the user can manually call mysql.gtid_pos_add_engine() if desired.
So to summarise
- Auto create mysql.gtid_slave_pos_{innodb,tokudb,myrocks} if a transaction is seen in one of those engines
- A --skip-gtid-auto-create-pos-table to disable this auto behaviour
- A stored procedure mysql.gtid_pos_add_engine() to facilitate manually adding an engine table.
What do you think, does this sound ok? Suggestions for better ways?
And thanks for commenting, this helped me rethink this part of the design which I was not very happy with.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Jonas Oreland <jonaso@google.com> writes:
how about --gtid_auto_create_engine_list=
default value = innodb,tokudb
(and the stored proc)
Yes, I like that a lot better, thanks for the suggestion. And following Jean-François' suggestion, I guess the default would be empty in the first version, and could later be extended to some reasonable non-empty list. - Kristian.
I have now most of the implementation of MDEV-12179 done. I wanted to present the way the feature now looks, and point to the code, in case there are any further comments on the design or implementation before it is finalised. To recap, the idea is to improve performance when using multiple transactional storage engines (eg. InnoDB/TokuDB/MyRocks) on the same replication slave, but not in the same transactions. By allowing more than one mysql.gtid_slave_pos* tables (one for each used engine), costly cross-engine transactions can be avoided. The feature is enabled by a system variable: --gtid-pos-auto-engines=innodb,myrocks This way, if the replication slave sees an innodb or myrocks transaction, it will create a new table mysql.gtid_slave_pos_innodb or mysql.gtid_slave_pos_myrocks in which to record the GTID position. For monitoring the feature, I added new status variables: Transactions_multi_engine Number of transactions that changed data in multiple (transactional) storage engines. Rpl_transactions_multi_engine Number of replicated transactions that involved changes in multiple (transactional) storage engines, before considering the update of the mysql.gtid_slave_posXXX table. Transactions_gtid_foreign_engine Number of replicated transactions where the update of the mysql.gtid_slave_posXXX table had to choose a storage engine that did not otherwise participate in the transaction. The current code is here: https://github.com/knielsen/server/tree/mdev12179 Following previous discussions, the default of --gtid-pos-auto-engines is currently empty (no automatic creation of tables by default). If desired, we could later enable auto-creation of a select set of storage engines (eg "innodb,myisam"). The auto-creation happens asynchroneously, in the background. Auto-creation should be a very rare event, and I spent some effort to minimise the overhead of this auto-creation on the normal processing of replicated transactions. On server start, as well as on START SLAVE, the available mysql.gtid_slave_pos* tables are auto-discovered and read as needed. Originally, I had the idea to supply a store procedure mysql.gtid_pos_add_engine() which would allow the DBA to explicitly create a table for a specific engine. But it seems such stored procedure is not much needed when --gtid-pos-auto-engines is available, and we do not have a prior history of providing such standard stored procedures. It could be easily added later, if desired. - Kristian.
I have now pushed the code to 10.3. It should appear in an upcoming MariaDB 10.3.1 release, IIUC. Following the discussion so far, the default for --gtid-pos-auto-engines is currently empty. It can be easily changed later (eg. to innodb,tokudb,rocksdb) simply by changing the default value in sql/mysqld.cc. The code will allow this default setting even if eg. tokudb or rocksdb is not enabled. (There might be a need to adjust a couple test cases that specially manipulates mysql.gtid_slave_pos in devious ways). Let me know in case of any problems or if there are comments/questions. - Kristian.
In order to document this properly (and in response to https://jira.mariadb.org/browse/MDEV-13236) , I want to clarify a few things: As I understand it: * the variable is dynamic, so changes becomes effective immediately * tables are created immediately whenever a transaction is seen from a specified engine (listed in the variable) * the manual creation of the tables was an initial design suggestion, and there should be no need to ever create a table manually. I don't have 10.3.1 running yet to test, but some questions: * what happens if slave threads are not stopped before changing the variable? Just delay the creation of the table, or can something unpleasant happen from a table being created in the middle of something? * what happens if a storage engine is later removed from the variable? If the table still exists, I presume the existence of the table takes priority and the table is still used - the variable is just used to check the existence of a table and create it if needs be, not determine its use? * what benefit does the variable being empty (the default) provide? I saw the comment about the principle of least surprise, but in what way could it being populated and the tables existing be negative? In other words, why would one not simply recommend setting this immediately? * If there is a benefit to it being empty (and therefore no multiple tables created), will there be a mechanism to remove a table beyond manually deleting? On 03/07/2017 15:15, Kristian Nielsen wrote:
I have now pushed the code to 10.3. It should appear in an upcoming MariaDB 10.3.1 release, IIUC.
Following the discussion so far, the default for --gtid-pos-auto-engines is currently empty. It can be easily changed later (eg. to innodb,tokudb,rocksdb) simply by changing the default value in sql/mysqld.cc. The code will allow this default setting even if eg. tokudb or rocksdb is not enabled.
(There might be a need to adjust a couple test cases that specially manipulates mysql.gtid_slave_pos in devious ways).
Let me know in case of any problems or if there are comments/questions.
- Kristian.
Ian Gilfillan <ian@mariadb.org> writes:
In order to document this properly (and in response to
Great! Thanks for looking into this.
https://jira.mariadb.org/browse/MDEV-13236) , I want to clarify a few things:
The main documentation is here, I'm sure it can be improved or made more visible: https://mariadb.com/kb/en/mariadb/mysqlgtid_slave_pos-table/
As I understand it: * the variable is dynamic, so changes becomes effective immediately
Yes. Note that changing it requires that all slaves are stopped. So in that sense, it only takes effect once slaves are running again.
* tables are created immediately whenever a transaction is seen from a specified engine (listed in the variable)
Yes, for some value of "immediately". As mentioned in https://mariadb.com/kb/en/mariadb/mysqlgtid_slave_pos-table/, the creation is asynchroneous. What this means is that, the need to create a table is discovered inside record_gtid(), which is deep in the replication of the transaction when it is time to commit it. It is not a good place to try to create a table. So the transaction continues without the new table, and the table creation is started in a background thread. So most likely the table will only be available a bit later, after that transaction (and possibly a few other) have been replicated. Usually, the table should be created in a fraction of a second. Also note that thanks to two-phase commit, the system is behaving correctly (in terms of crash safety and consistency) also for the transactions that complete prior to the new table being available.
* the manual creation of the tables was an initial design suggestion, and there should be no need to ever create a table manually.
Yes, generally there should be no need to ever create a table manually. The initial table is created by mysql_install_db (or mysql_upgrade), and --gtid-pos-auto-engines can create any other tables required. The one situation I can think of where manual action makes sense is as explained in the documentation, in a system using primarily MyISAM, but the database was created as InnoDB. In that case it can make sense to ALTER the table to MyISAM (but not to create a table manually).
I don't have 10.3.1 running yet to test, but some questions: * what happens if slave threads are not stopped before changing the variable? Just delay the creation of the table, or can something unpleasant happen from a table being created in the middle of something?
As explained in the documentation, this results in an error that slaves must be stopped before changing the variable, similar to many other replication variables.
* what happens if a storage engine is later removed from the variable? If the table still exists, I presume the existence of the table takes priority and the table is still used - the variable is just used to check the existence of a table and create it if needs be, not determine its use?
Yes. The variable only concerns the _creation_ of the table. The _use_ of the table is automatic whenever the table is there.
* what benefit does the variable being empty (the default) provide? I saw the comment about the principle of least surprise, but in what way could it being populated and the tables existing be negative? In other words, why would one not simply recommend setting this immediately?
As I wrote before, I do not have any strong opinions on this. The feature is designed to work well in all cases. Generally a default of --gtid-pos-auto-engines=innodb,tokudb,rocksdb should be beneficial. It was always a sore spot in the original GTID implementation that there was no really good answer to which engine the mysql.gtid_slave_pos table should be created with, and it has caused real problem as well. If the mysql.gtid_slave_pos table was created as MyISAM, and --gtid-pos-auto-engines=innodb,tokudb,rocksdb set as default, then there will be a decent solution to this issue. There are also some arguments in favour of an empty default, at least initially until more testing has been done. MariaDB replication is extremely complex code, there can be bugs. As you say, there is the principle of least surprise. New tables magically springing into life during replication does have some element of unpleasant surprise. And do not forget that this feature is mostly/only useful to users that want to use TokuDB and RocksDB (both still relatively new), _and_ at the same time use InnoDB, in a performance-critical setting. Not a common usecase. Technically, setting a non-empty default is just putting the right default in sql/mysqld.cc, the code is already written to handle it. The current empty default is based on the feedback from the users who took an interest in the feature so far.
* If there is a benefit to it being empty (and therefore no multiple tables created), will there be a mechanism to remove a table beyond manually deleting?
There is currently no such mechanism, and there should not be a need to remove a table, I think. There _is_ code that tries to correcly handle if a table is manually removed. Such removal is best done with slave threads stopped. And the user would be responsible that the rows in the table are either no longer needed, or alternatively that they are migrated to the other tables. Generally, it should not be recommended to ever do such manual removal, just like it is not a good idea to manually remove or otherwise mess with other tables in the mysql schema. That is why it is not documented - just like the effect of removing other mysql.* tables is not documented. I hope this helps. Feel free to ask again if you have more questions or if something was unclear, and thanks for helping improve the documentation. - Kristian.
On 03/07/2017 15:15, Kristian Nielsen wrote:
I have now pushed the code to 10.3. It should appear in an upcoming MariaDB 10.3.1 release, IIUC.
Following the discussion so far, the default for --gtid-pos-auto-engines is currently empty. It can be easily changed later (eg. to innodb,tokudb,rocksdb) simply by changing the default value in sql/mysqld.cc. The code will allow this default setting even if eg. tokudb or rocksdb is not enabled.
(There might be a need to adjust a couple test cases that specially manipulates mysql.gtid_slave_pos in devious ways).
Let me know in case of any problems or if there are comments/questions.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
participants (6)
-
Ian Gilfillan
-
Jean-Francois B. Gagne
-
jocelyn fournier
-
Jonas Oreland
-
Kristian Nielsen
-
Will Fong