Re: [Maria-discuss] Per-db binlogging
Sergei, Thank you very much for your reply! A later post by Henrik Ingo seems to say that I should cc maria-discuss@lists.launchpad.net. I'll try it and see if it bounces--I THINK I'm registered with launchpad.net :-) Let me know if sending to both lists annoys anyone, and I will quit the cc :-) See my responses below On Wed, Sep 15, 2010 at 1:53 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Weldon!
On Sep 13, Weldon Whipple wrote:
<snip/>
Is it plausible to identify all places where a binlog addition might occur? In those places, if the singleton USER_BIN_LOG_MRG exists, ask it if the current DB is being binlogged. If so, send the request to the DB_BIN_LOG instance.
By "addition" you mean when an event is written to binlog?
Yes! (exactly)--any event relating to that database that is written to binlog. (Hmm ... I wonder if there might ever be more "global" events that should be written ... Maybe we don't care about them??). Also, since these databases are "tied" to a user on a box, we probably don't care about renaming a DB ...
See where mysql_bin_log.write is called.
Excellent information! (I was hoping it might be that easy ???). If I succeed in making a DB_BINLOG class derive from MYSQL_BIN_LOG (with an instance of DB_BINLOG for each database that is currently binlogging), I could--in theory, at least--just call something like db_binlog->write () at those places.
Questions I Have. ----------------
1. I notice that binlog is in the list of engines and plugins (displayed by SHOW ...). The bottom of log.cc has "mysql_declare_plugin(binlog)". Do I need one for db_plugin? (probably not?)
depends. binlog needs to be declated a plugin, to be able to pretent being a storage engine, to be able to force two phase commit (2PC).
2PC is used to commit or rollback a transaction atomically in more than one engine. And MySQL only uses 2PC when a transaction spans more than one engine. But we want a transaction to be committed in the engine and written to a binlog - and it should be done atomically too. So, binlog pretends to be an engine, pretends to participate in a transaction, which forces MySQL to use 2PC which can guarantee that a transaction is either committed and written to a binlog - or not committed and not written. Reliably, even if MySQL crashes in the middle.
So, if you don't use innodb, or don't use innodb_flush_log_at_trx_commit=1 or don't use sync_binlog=1 you probably don't need to support 2PC as your setup is not crash proof anyway.
If you care about innodb<->binlog consistency in the presence of crashes, you should support 2PC in your binlog, as explained above.
We DO have a small percentage of innodb databases. If 2PC commits (rolls back, etc.) atomically to more than one engine, does that mean TWO (only)? ... or could it happen for three (specifically, innodb, MYSQL_BIN_LOG's binlog, and DB_BIN_LOG's binlog)? The two kinds of binlog files would have the same format (except that one is for only one database). Are they considered two storage engines/plugins? (I notice that the "mysql_declare_plugin(binlog)" declaration is at the bottom of log.cc. Most of the other storage engines' corresponding declarations are in files in the storage subdirectories. If I were to need a "mysql_declare_plugin(db_binlog)" declaration, could it also go at the bottom of log.cc? ... Or (since a DB_BINLOG is AKO [a kind of] MYSQL_BIN_LOG [in the OO sense]), are MYSQL_BIN_LOGs and DB_BINLOGs both kinds of binlog plugins ...?
2. I've diagrammed the binary log event taxonomy--the 24 classes that inherit from Log_event (in log_event.cc/h). That was my main motivation for wanting to inherit from MYSQL_BIN_LOG.
Was that a question? :)
No. Just justification for my desire to simplify by having DB_BINLOG extend the MYSQL_BIN_LOG class. (I realize that I MIGHT have to add "virtual" in front of a method or two--possibly. Hopefully I would need to make no--or minimal-- changes to MYSQL_BIN_LOG.)
3. On the other hand, I wonder if inheriting from MYSQL_BIN_LOG might be overkill for "master only" binlogging--without the slave/replication part of it.
No, I think inheriting from MYSQL_BIN_LOG is correct. At least it's much easier to build on existing binlogging than to implement everything from scratch.
Amen!!
4. What version should I be working on? (We're currently running 5.1.47.) What should I check out of bzr/launchpad? (Is that still what we're doing with the changes in ownership?)
What version you plan to use? If it'll be 5.1 - you have to work in 5.1, use lp:mysql-server/5.1 branch:
bzr branch lp:mysql-server/5.1
We TRY to keep as current as possible (at the company). At some point, we'll need to move to the next major release--whatever it ends up being. (Then there is also the unknown [to me] of MySQL vs. MariaDB ... :-) We'll use bzr branch lp:mysql-server/5.1
5. I see that THD is involved in binlogging. (Several THD methods are defined in log.cc.) Are there any caveats here?
There is a binlog cache (or buffer) in THD that stores binlog events for the current transaction. They all are written to disk together, when a transaction is committed. You need to decide what you want to do when the current database is changed in the middle of a transaction - what binlog the transaction should be written to.
By "the current database is changed", I imagine you refer to the "USE 'database';" command. That's a good question. I THINK we have decided to ignore (for the present, at least), queries (and also transactions?) that involve multiple databases. Because we are doing per-DB binlogging, if multiple databases are involved, we (almost) consider it an anomaly at our ISP ... (I'm thinking that we might need to consider THD's binlog cache only for transactions that involve innodb and no more than one database ...)
6. What files should I use? A new h/cc pair? ... or just add to the hierarchy in log.h/cc?
As you like. New pair of files is cleaner. But it may require you to move some declarations from log.cc to a header file, or to make some static functions extern.
All my prototypes have just expanded what is in log.cc and log.h. (I came VERY close to creating db_binlog.cc/h, however, which I would have done in some of my past--non-MySQL--projects). However, noticing most of the logging hierarchy in the single log.cc/h pair, I decided the implementors must have done it for a reason, and have decided to follow suit--for now, at least.
What I've Implemented So Far (about 13 Prototypes). --------------------------------------------------
(PRELIMINARY NOTE:)
Our initial plan was to call it "Per-User Binlogging". The user here is a cPanel [control panel] user. cPanel's database naming convention is to have all of a user's database names start with the username, followed by an underscore, followed by a distinctive database name. Thus all the commands above start with
<verb> USER ...
Originally most of them could end either "DATABASE <db>" or "USER <user>" (with user meaning cPanel user). Thus, in the original plan the first command above was:
START USER BINLOG FOR (USER <user> | DATABASE <db>)
We have realized that per-database binlogging is all we really need. We imagine that others in our situation might want something similar. Since not everyone uses cPanel, we should forget per[cPanel]-user binlogging, focusing on per-database binlogging.
If every user is a different MySQL account - that it a different row in the mysql.user table - than per-user binlogging is much cleaner and better defined concept.
A statement can affect more than one database (say, UPDATE db1.t1, db2.t2 SET ...), the current database may be changed in the middle of transaction, and so on. On the other hand, any statement or a transaction is always executed in the context of as single user - you cannot change user in the middle of transaction. (*)
(*) changing security context only affect privileges, but not the destination binlog.
Actually (and this is because of the way cPanel does things--I have been "bitten" by this many times!!), each database has a (different) corresponding user in the mysql.user table (!!!). So (for example), if a user signs up with our ISP with a user name of elvispre every one of his (perhaps hundreds/thousands of) databases will have names something like: elvispre_friends elvispre_earlyyears elvispre_hitsongs . . . And (here's the surprising part)--each one of those databases will have a MATCHING ROW IN mysql.user!! So, for us to migrate our customer elvispre to another physical server, we DO (ultimately) need to binlog all of the databases/users that match/are like 'elvispre\_%'; . In some of my earlier prototypes, the START/FLUSH-LOCK/SHOW/UNLOCK/STOP commands have allowed for "like" clauses at the end. Maybe I need to go that route? ALL of my prototypes have had binlogging PER DATABASE. I wonder how difficult/feasible/desirable/useful it would be to create separate binlog files for databases (or users) whose name matches a "like" clause. ... Probably more important: Would this feature be of general interest to the MySQL user community at large? (... and, if so, should it be database- or user- or database_pattern- or user_pattern-based). (My initial idea is that per [single] database/user might be easiest to define/implement??)
Regards, Sergei
Thanks again for your reply!! Weldon
Hi, Weldon! On Sep 15, Weldon Whipple wrote:
See where mysql_bin_log.write is called.
Excellent information! (I was hoping it might be that easy ???). If I succeed in making a DB_BIN_LOG class derive from MYSQL_BIN_LOG (with an instance of DB_BIN_LOG for each database that is currently binlogging), I could--in theory, at least--just call something like db_binlog->write () at those places.
Better make this write method virtual and implement it in your derived class.
Questions I Have. ----------------
1. I notice that binlog is in the list of engines and plugins (displayed by SHOW ...). The bottom of log.cc has "mysql_declare_plugin(binlog)". Do I need one for db_plugin? (probably not?)
depends. binlog needs to be declated a plugin, to be able to pretent being a storage engine, to be able to force two phase commit (2PC).
2PC is used to commit or rollback a transaction atomically in more than one engine. And MySQL only uses 2PC when a transaction spans more than one engine. But we want a transaction to be committed in the engine and written to a binlog - and it should be done atomically too. So, binlog pretends to be an engine, pretends to participate in a transaction, which forces MySQL to use 2PC which can guarantee that a transaction is either committed and written to a binlog - or not committed and not written. Reliably, even if MySQL crashes in the middle.
So, if you don't use innodb, or don't use innodb_flush_log_at_trx_commit=1 or don't use sync_binlog=1 you probably don't need to support 2PC as your setup is not crash proof anyway.
If you care about innodb<->binlog consistency in the presence of crashes, you should support 2PC in your binlog, as explained above.
We DO have a small percentage of innodb databases. If 2PC commits (rolls back, etc.) atomically to more than one engine, does that mean TWO (only)? ... or could it happen for three (specifically, innodb, MYSQL_BIN_LOG's binlog, and DB_BIN_LOG's binlog)?
See http://en.wikipedia.org/wiki/Two-phase_commit_protocol It's two phases, not two engines, any number of engines can participate in a transaction. But! All of them but one have to support two-phase commit. Binlog does not support it. Apparently, DB_BIN_LOG will not support it either - adding two-phase commit support to binlog will surely be beyond the scope of your work. It means, you cannot have InnoDB, binlog, and DB_BIN_LOG to participate in a two-phase commit. But you can disable MYSQL_BIN_LOG altogether and only use DB_BIN_LOG - in that case you can have a two-phase commit between DB_BIN_LOG and InnoDB.
The two kinds of binlog files would have the same format (except that one is for only one database). Are they considered two storage engines/plugins?
If you want to write to both - yes. A transaction will need to be written to both independently, so you want them to be in sync, and MySQL uses 2PC to synchronize different data storages. You can solve that by 1) not using MYSQL_BIN_LOG. In this case there's only InnoDB and DB_BIN_LOG to keep in sync, and InnoDB can do 2PC. 2) not making DB_BIN_LOG independent from MYSQL_BIN_LOG. For example, everything is written to binlog only. Then you have a background thread (or even a separate process) that reads binlog and writes your per-db binlogs. In that case you have only InnoDB and MYSQL_BIN_LOG to keep in sync, and MySQL already does that. Your background thread is only catching up, it is allowed to fall behind.
(I notice that the "mysql_declare_plugin(binlog)" declaration is at the bottom of log.cc. Most of the other storage engines' corresponding declarations are in files in the storage subdirectories. If I were to need a "mysql_declare_plugin(db_binlog)" declaration, could it also go at the bottom of log.cc? ... Or (since a DB_BIN_LOG is AKO [a kind of] MYSQL_BIN_LOG [in the OO sense]), are MYSQL_BIN_LOGs and DB_BIN_LOGs both kinds of binlog plugins ...?
if you want to have two plugins in one file you still need to use one mysql_declare_plugin(), not two. See example at the end of http://www.packtpub.com/article/look-inside-mysql-daemon-plugin Regards, Sergei
Sergei, Thanks for all the helpful links! I've been reading them as I downloaded 5.5 with bzr and building the source. (In the process I stumbled onto your and Andrew's new book on plugins. I bought an electronic copy and downloaded it to my iPad to read tonight and this weekend. Hopefully the book will demystify them for me. :-) More later ... Thanks again! Weldon On Thu, Sep 16, 2010 at 6:34 AM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Weldon!
On Sep 15, Weldon Whipple wrote:
See where mysql_bin_log.write is called.
Excellent information! (I was hoping it might be that easy ???). If I succeed in making a DB_BIN_LOG class derive from MYSQL_BIN_LOG (with an instance of DB_BIN_LOG for each database that is currently binlogging), I could--in theory, at least--just call something like db_binlog->write () at those places.
Better make this write method virtual and implement it in your derived class.
Questions I Have. ----------------
1. I notice that binlog is in the list of engines and plugins (displayed by SHOW ...). The bottom of log.cc has "mysql_declare_plugin(binlog)". Do I need one for db_plugin? (probably not?)
depends. binlog needs to be declated a plugin, to be able to pretent being a storage engine, to be able to force two phase commit (2PC).
2PC is used to commit or rollback a transaction atomically in more than one engine. And MySQL only uses 2PC when a transaction spans more than one engine. But we want a transaction to be committed in the engine and written to a binlog - and it should be done atomically too. So, binlog pretends to be an engine, pretends to participate in a transaction, which forces MySQL to use 2PC which can guarantee that a transaction is either committed and written to a binlog - or not committed and not written. Reliably, even if MySQL crashes in the middle.
So, if you don't use innodb, or don't use innodb_flush_log_at_trx_commit=1 or don't use sync_binlog=1 you probably don't need to support 2PC as your setup is not crash proof anyway.
If you care about innodb<->binlog consistency in the presence of crashes, you should support 2PC in your binlog, as explained above.
We DO have a small percentage of innodb databases. If 2PC commits (rolls back, etc.) atomically to more than one engine, does that mean TWO (only)? ... or could it happen for three (specifically, innodb, MYSQL_BIN_LOG's binlog, and DB_BIN_LOG's binlog)?
See http://en.wikipedia.org/wiki/Two-phase_commit_protocol
It's two phases, not two engines, any number of engines can participate in a transaction.
But! All of them but one have to support two-phase commit. Binlog does not support it. Apparently, DB_BIN_LOG will not support it either - adding two-phase commit support to binlog will surely be beyond the scope of your work.
It means, you cannot have InnoDB, binlog, and DB_BIN_LOG to participate in a two-phase commit. But you can disable MYSQL_BIN_LOG altogether and only use DB_BIN_LOG - in that case you can have a two-phase commit between DB_BIN_LOG and InnoDB.
The two kinds of binlog files would have the same format (except that one is for only one database). Are they considered two storage engines/plugins?
If you want to write to both - yes. A transaction will need to be written to both independently, so you want them to be in sync, and MySQL uses 2PC to synchronize different data storages.
You can solve that by
1) not using MYSQL_BIN_LOG. In this case there's only InnoDB and DB_BIN_LOG to keep in sync, and InnoDB can do 2PC.
2) not making DB_BIN_LOG independent from MYSQL_BIN_LOG. For example, everything is written to binlog only. Then you have a background thread (or even a separate process) that reads binlog and writes your per-db binlogs. In that case you have only InnoDB and MYSQL_BIN_LOG to keep in sync, and MySQL already does that. Your background thread is only catching up, it is allowed to fall behind.
(I notice that the "mysql_declare_plugin(binlog)" declaration is at the bottom of log.cc. Most of the other storage engines' corresponding declarations are in files in the storage subdirectories. If I were to need a "mysql_declare_plugin(db_binlog)" declaration, could it also go at the bottom of log.cc? ... Or (since a DB_BIN_LOG is AKO [a kind of] MYSQL_BIN_LOG [in the OO sense]), are MYSQL_BIN_LOGs and DB_BIN_LOGs both kinds of binlog plugins ...?
if you want to have two plugins in one file you still need to use one mysql_declare_plugin(), not two. See example at the end of http://www.packtpub.com/article/look-inside-mysql-daemon-plugin
Regards, Sergei
Hi, Weldon! On Sep 16, Weldon Whipple wrote:
Sergei,
Thanks for all the helpful links! I've been reading them as I downloaded 5.5 with bzr and building the source. (In the process I stumbled onto your and Andrew's new book on plugins. I bought an electronic copy and downloaded it to my iPad to read tonight and this weekend. Hopefully the book will demystify them for me. :-)
Yes, I hope too ;) By the way, it has two chapters on information_schema plugins and one chapter (daemon plugins) that explains how to create and use system variables in plugins (but not sys_vars.cc stuff). Regards, Sergei
It's been awhile since I first posted about db-binlogging (actually, file-per-database binlogging). Our requirement is to enable the migration of a user account--including the user's MySQL database(s)--from one physical server to another, minimizing database downtime. After being briefly sidetracked to work on two other projects, it LOOKS like I have an implementation that fulfills the needs of my employer. In a typical migration, our tools (likely implemented in perl) would issue this command sequence (where db1 and db2 are the user's databases): START BINLOG FOR DATABASE db1; START BINLOG FOR DATABASE db2; FLUSH TABLES db1.t1, db1.t2, db1.t3, db2.t1, db2.t2 WITH READ LOCK; SHOW DATABASE BINLOG STATUS; # Remember binlog coordinates for db1 and db2 [Issue mysqldump or other appropriate commands to export the databases and copy them to new server] UNLOCK TABLES; # Unlocks tables locked earlier in this session [Use mysqlbinlog command or other commands to bring up user databases on new physical server--already running mysqld to support existing users.] [On original server, issue SHOW DATABASE BINLOG STATUS to see if changes have occurred since first SHOW DATABASE BINLOG STATUS.] If databases have changed, then (on originating server): - flush the same tables as above with READ LOCK - STOP BINLOG FOR DATABASE db1; - STOP BINLOG FOR DATABASE db2; - Copy binlogs to new server and issue mysqlbinlog command, specifying coordinates from initial SHOW DATABASE BINLOG STATUS /* If no databases changed during the migration, skip the above "IF" */ (Note: on most account migrations, databases change infrequently, so the initial data copied from the old server to the new server will still be up-to-date.) Change routers to direct user's traffic to new server that has user's MySQL databases. The tool writers here say that the FLUSH TABLES ... WITH READ LOCK and UNLOCK TABLES are sufficiently granular for our installations. Those two commands are part of MySQL 5.5. (Well, UNLOCK TABLES has been around since before 5.5 ...) I have implemented the following in the version of MySQL 5.5 I checked out of launchpad in September. START BINLOG FOR DATABASE db; SHOW DATABASE BINLOG STATUS; STOP BINLOG FOR DATABASE db; I also created a test suite in mysql-test/suite/db_binlog that passes all tests. The changes are conditionally compiled by #defining HAVE_DB_BINLOG I would appreciate putting the code out for feedback/suggestions. If there IS interest, please suggest how you want it. (E.g., a launchpad patch file, etc.) Thanks in advance! Weldon Whipple weldon@whipple.org (P.S. Because of our dependence on FLUSH TABLES t1, t2, [... tn] WITH READ LOCK, our company's rollout will wait until after 5.5 has been released and we have verified it.)
participants (2)
-
Sergei Golubchik
-
Weldon Whipple