[Maria-developers] Transactional INFORMATION_SCHEMA / status variables?
Would it make sense to have transactional behaviour for status variables, and/or information_schema tables? Or does this break the code and/or user expectations too much? The motivation is to follow up on MWL#116, group commit, which introduces consistent commit order between storage engines and binary log. I want to use this to get a consistent binlog position for START TRANSACTION WITH CONSISTENT SNAPSHOT, without taking any additional locks. Currently, I believe it is common (eg. mysqlbinlog --master-data --single-transaction) to do something like this: FLUSH TABLES WITH READ LOCK; START TRANSACTION WITH CONSISTENT SNAPSHOT; UNLOCK TABLES <take consistent backup by dumping tables using the consistent snapshot> and this takes a lock that can depending on circumstances severely affect the server: http://www.mysqlperformanceblog.com/2010/04/24/how-fast-is-flush-tables-with... One idea is to let the binlog storage engine participate in START TRANSACTION WITH CONSISTENT SNAPSHOT, by installing a start_consistent_snapshot() method in its handlerton. And then do something like this: START TRANSACTION WITH CONSISTENT SNAPSHOT; SELECT variable_value FROM INFORMATION_SCHEMA.SESSION_STATUS WHERE variable_name IN ('binlog_master_file', 'binlog_master_position'); <dump other transactional tables> If the SELECT of binlog_master_file and binlog_master_position could be transactional, then the binlog engine could return the correct values associated with the consistent snapshot, without blocking any other transactions. I like the simplicity of this idea, but I do not understand server transaction handling enough to be sure it will work well, some concerns: - Using SHOW STATUS / INFORMATION_SCHEMA.SESSION_STATUS like this could be surprising to people that do not expect different parts of the results to display different kinds of transactional behaviour. This could be helped by using instead a separate INFORMATION_SCHEMA table for the binlog position. - But this raises another concern, if an INFORMATION_SCHEMA can be transactional in this respect? - Also, it seems to me that merely selecting from such transactional INFORMATION_SCHEMA table would then start a new transaction inside the binlog engine. I wonder if this would cause any unpleasant side effects? Any suggestions or comments? ---- If this does not work, I have another idea, which I think is more general, but also more complicated to implement. The idea is that every transaction has a local transaction ID, assigned at the start (we already have this, in thd->transaction.xid_state.xid). Each engine will transactionally store the local transaction ID of the last transaction committed. The binary log will similarly store this ID along with every transaction that is binlogged. Then START TRANSACTION WITH CONSISTENT SNAPSHOT could optionally return the local transaction ID of the last committed transaction visible in the snapshot obtained. This local transaction ID could then be mapped to binlog position (with eg. mysqldump), and more generally any binlog plugin could provide a way to map such local transaction ID into its own global transaction ID. Similarly, after restore of InnoDB hot backup or LVM snapshot, one could ask the engine for last committed local transaction ID, and map this to binlog position / global transaction ID to be able to use the restored backup to provision a new slave. This would work with any storage engine and any binlog/replication implementation, without any need for FLUSH TABLES WITH READ LOCK. - Kristian.
Hi, Kristian! On Oct 06, Kristian Nielsen wrote:
Would it make sense to have transactional behaviour for status variables, and/or information_schema tables? Or does this break the code and/or user expectations too much?
The motivation is to follow up on MWL#116, group commit, which introduces consistent commit order between storage engines and binary log.
I want to use this to get a consistent binlog position for START TRANSACTION WITH CONSISTENT SNAPSHOT, without taking any additional locks. Currently, I believe it is common (eg. mysqlbinlog --master-data --single-transaction) to do something like this:
FLUSH TABLES WITH READ LOCK; START TRANSACTION WITH CONSISTENT SNAPSHOT; UNLOCK TABLES <take consistent backup by dumping tables using the consistent snapshot>
and this takes a lock that can depending on circumstances severely affect the server.
One idea is to let the binlog storage engine participate in START TRANSACTION WITH CONSISTENT SNAPSHOT, by installing a start_consistent_snapshot() method in its handlerton. And then do something like this:
START TRANSACTION WITH CONSISTENT SNAPSHOT; SELECT variable_value FROM INFORMATION_SCHEMA.SESSION_STATUS WHERE variable_name IN ('binlog_master_file', 'binlog_master_position'); <dump other transactional tables>
If the SELECT of binlog_master_file and binlog_master_position could be transactional, then the binlog engine could return the correct values associated with the consistent snapshot, without blocking any other transactions.
This should be easy to do, I expect. And although it's a change in behavior, I think, it'll practically go unnoticed - because it only affects START TRANSACTION WITH CONSISTENT SNAPSHOT.
I like the simplicity of this idea, but I do not understand server transaction handling enough to be sure it will work well, some concerns:
- Using SHOW STATUS / INFORMATION_SCHEMA.SESSION_STATUS like this could be surprising to people that do not expect different parts of the results to display different kinds of transactional behaviour. This could be helped by using instead a separate INFORMATION_SCHEMA table for the binlog position.
I think it's not an issue, as only START TRANSACTION WITH CONSISTENT SNAPSHOT is affected.
- But this raises another concern, if an INFORMATION_SCHEMA can be transactional in this respect?
Why not? It simply shows information taken from elsewhere. If that information behaves in a transactional manner, so will I_S table.
- Also, it seems to me that merely selecting from such transactional INFORMATION_SCHEMA table would then start a new transaction inside the binlog engine. I wonder if this would cause any unpleasant side effects?
This is a bigger change - not more complex, but but more certainly more noticeable. Currently I_S alsways shows the current binlog position, not the one from the beginning of a transaction. Changing that may break applications.
---- If this does not work, I have another idea, which I think is more general, but also more complicated to implement.
The idea is that every transaction has a local transaction ID, assigned at the start (we already have this, in thd->transaction.xid_state.xid).
Each engine will transactionally store the local transaction ID of the last transaction committed. The binary log will similarly store this ID along with every transaction that is binlogged.
Then START TRANSACTION WITH CONSISTENT SNAPSHOT could optionally return the local transaction ID of the last committed transaction visible in the snapshot obtained. This local transaction ID could then be mapped to binlog position (with eg. mysqldump), and more generally any binlog plugin could provide a way to map such local transaction ID into its own global transaction ID.
Similarly, after restore of InnoDB hot backup or LVM snapshot, one could ask the engine for last committed local transaction ID, and map this to binlog position / global transaction ID to be able to use the restored backup to provision a new slave.
This would work with any storage engine and any binlog/replication implementation, without any need for FLUSH TABLES WITH READ LOCK.
Uhm, I don't think I understood your idea :( Regards, Sergei
Sergei Golubchik <serg@askmonty.org> writes:
On Oct 06, Kristian Nielsen wrote:
- But this raises another concern, if an INFORMATION_SCHEMA can be transactional in this respect?
Why not? It simply shows information taken from elsewhere. If that information behaves in a transactional manner, so will I_S table.
Right, sounds reasonable, thanks. I think this is a good approach then.
If the SELECT of binlog_master_file and binlog_master_position could be transactional, then the binlog engine could return the correct values associated with the consistent snapshot, without blocking any other transactions.
This should be easy to do, I expect. And although it's a change in behavior, I think, it'll practically go unnoticed - because it only affects START TRANSACTION WITH CONSISTENT SNAPSHOT.
Yes. In fact (at least to my knowledge), currently there is nowhere in SHOW STATUS or INFORMATION_SCHEMA that the binlog position is available. So I think I will introduce two new status variables: binlog_trx_file binlog_trx_position These will be the binlog position corresponding to the point of START TRANSACTION WITH CONSISTENT SNAPSHOT. If no snapshot was started, they will just be the current position.
- Also, it seems to me that merely selecting from such transactional INFORMATION_SCHEMA table would then start a new transaction inside the binlog engine. I wonder if this would cause any unpleasant side effects?
This is a bigger change - not more complex, but certainly more noticeable. Currently I_S always shows the current binlog position, not the one from the beginning of a transaction. Changing that may break applications.
Yeah. Since this would be new status variables, it probably doesn't matter ... on the other hand, I do not think this is particularly useful either. It does not really give anything better than simply asking for the position at the start of the transaction, and it would require taking extra mutexes at the start of every transaction touching the binary log. I think I will just let the status reflect the current position, unless START TRANSACTION WITH CONSISTENT SNAPSHOT has been run, in which case they will return the consistent position.
If this does not work, I have another idea, which I think is more general, but also more complicated to implement.
Uhm, I don't think I understood your idea :(
Nevermind, I think the simple approach will work. And for the complex idea, when I have worked more on it and thought things through better, we can re-visit it. Mark Nielsen <menprojects@gmail.com> writes:
For my personal opinion, I can go either way. What is most important on a high query database is non-blocking. For example, if the hard drive is acting funny, show master status, show slave status, and certain other show commands can stall. Other show commands, like show variables and show global status never stall.
When it comes to information schema, the top priority is no stalling. This makes monitoring more reliable. We've had situations where a query detecting the status of a machine would stall, and then every 5 minutes it would check again, and stall, and over time it eats up the connections with dead connections.
I agree with this. The way this will be implemented, there should be no stalls. InnoDB commits are broken up in two parts, a "fast" part which only commits the transaction to memory and makes it visible to other transactions, and a "slow" part which flushes the commit record to disk. The synchronisation with the binlog position only needs to synchronise with the "fast" part, the "slow" part can run independently. - Kristian.
For my personal opinion, I can go either way. What is most important on a high query database is non-blocking. For example, if the hard drive is acting funny, show master status, show slave status, and certain other show commands can stall. Other show commands, like show variables and show global status never stall. When it comes to information schema, the top priority is no stalling. This makes monitoring more reliable. We've had situations where a query detecting the status of a machine would stall, and then every 5 minutes it would check again, and stall, and over time it eats up the connections with dead connections. When I hear about transactions with information schema, I get a little antsy because of the possibility of stalls. I'd rather it be inaccurate slightly, or to be able to turn off and on a variable for the transactions. Mark On Wed, Oct 6, 2010 at 3:03 AM, Kristian Nielsen <knielsen@knielsen-hq.org>wrote:
Would it make sense to have transactional behaviour for status variables, and/or information_schema tables? Or does this break the code and/or user expectations too much?
The motivation is to follow up on MWL#116, group commit, which introduces consistent commit order between storage engines and binary log.
I want to use this to get a consistent binlog position for START TRANSACTION WITH CONSISTENT SNAPSHOT, without taking any additional locks. Currently, I believe it is common (eg. mysqlbinlog --master-data --single-transaction) to do something like this:
FLUSH TABLES WITH READ LOCK; START TRANSACTION WITH CONSISTENT SNAPSHOT; UNLOCK TABLES <take consistent backup by dumping tables using the consistent snapshot>
and this takes a lock that can depending on circumstances severely affect the server:
http://www.mysqlperformanceblog.com/2010/04/24/how-fast-is-flush-tables-with...
One idea is to let the binlog storage engine participate in START TRANSACTION WITH CONSISTENT SNAPSHOT, by installing a start_consistent_snapshot() method in its handlerton. And then do something like this:
START TRANSACTION WITH CONSISTENT SNAPSHOT; SELECT variable_value FROM INFORMATION_SCHEMA.SESSION_STATUS WHERE variable_name IN ('binlog_master_file', 'binlog_master_position'); <dump other transactional tables>
If the SELECT of binlog_master_file and binlog_master_position could be transactional, then the binlog engine could return the correct values associated with the consistent snapshot, without blocking any other transactions.
I like the simplicity of this idea, but I do not understand server transaction handling enough to be sure it will work well, some concerns:
- Using SHOW STATUS / INFORMATION_SCHEMA.SESSION_STATUS like this could be surprising to people that do not expect different parts of the results to display different kinds of transactional behaviour. This could be helped by using instead a separate INFORMATION_SCHEMA table for the binlog position.
- But this raises another concern, if an INFORMATION_SCHEMA can be transactional in this respect?
- Also, it seems to me that merely selecting from such transactional INFORMATION_SCHEMA table would then start a new transaction inside the binlog engine. I wonder if this would cause any unpleasant side effects?
Any suggestions or comments?
----
If this does not work, I have another idea, which I think is more general, but also more complicated to implement.
The idea is that every transaction has a local transaction ID, assigned at the start (we already have this, in thd->transaction.xid_state.xid).
Each engine will transactionally store the local transaction ID of the last transaction committed. The binary log will similarly store this ID along with every transaction that is binlogged.
Then START TRANSACTION WITH CONSISTENT SNAPSHOT could optionally return the local transaction ID of the last committed transaction visible in the snapshot obtained. This local transaction ID could then be mapped to binlog position (with eg. mysqldump), and more generally any binlog plugin could provide a way to map such local transaction ID into its own global transaction ID.
Similarly, after restore of InnoDB hot backup or LVM snapshot, one could ask the engine for last committed local transaction ID, and map this to binlog position / global transaction ID to be able to use the restored backup to provision a new slave.
This would work with any storage engine and any binlog/replication implementation, without any need for FLUSH TABLES WITH READ LOCK.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
participants (3)
-
Kristian Nielsen
-
Mark Nielsen
-
Sergei Golubchik