Sergei, Thank you very much for your reply! A later post by Henrik Ingo seems to say that I should cc maria-discuss@lists.launchpad.net. I'll try it and see if it bounces--I THINK I'm registered with launchpad.net :-) Let me know if sending to both lists annoys anyone, and I will quit the cc :-) See my responses below On Wed, Sep 15, 2010 at 1:53 PM, Sergei Golubchik <serg@askmonty.org> wrote:
Hi, Weldon!
On Sep 13, Weldon Whipple wrote:
<snip/>
Is it plausible to identify all places where a binlog addition might occur? In those places, if the singleton USER_BIN_LOG_MRG exists, ask it if the current DB is being binlogged. If so, send the request to the DB_BIN_LOG instance.
By "addition" you mean when an event is written to binlog?
Yes! (exactly)--any event relating to that database that is written to binlog. (Hmm ... I wonder if there might ever be more "global" events that should be written ... Maybe we don't care about them??). Also, since these databases are "tied" to a user on a box, we probably don't care about renaming a DB ...
See where mysql_bin_log.write is called.
Excellent information! (I was hoping it might be that easy ???). If I succeed in making a DB_BINLOG class derive from MYSQL_BIN_LOG (with an instance of DB_BINLOG for each database that is currently binlogging), I could--in theory, at least--just call something like db_binlog->write () at those places.
Questions I Have. ----------------
1. I notice that binlog is in the list of engines and plugins (displayed by SHOW ...). The bottom of log.cc has "mysql_declare_plugin(binlog)". Do I need one for db_plugin? (probably not?)
depends. binlog needs to be declated a plugin, to be able to pretent being a storage engine, to be able to force two phase commit (2PC).
2PC is used to commit or rollback a transaction atomically in more than one engine. And MySQL only uses 2PC when a transaction spans more than one engine. But we want a transaction to be committed in the engine and written to a binlog - and it should be done atomically too. So, binlog pretends to be an engine, pretends to participate in a transaction, which forces MySQL to use 2PC which can guarantee that a transaction is either committed and written to a binlog - or not committed and not written. Reliably, even if MySQL crashes in the middle.
So, if you don't use innodb, or don't use innodb_flush_log_at_trx_commit=1 or don't use sync_binlog=1 you probably don't need to support 2PC as your setup is not crash proof anyway.
If you care about innodb<->binlog consistency in the presence of crashes, you should support 2PC in your binlog, as explained above.
We DO have a small percentage of innodb databases. If 2PC commits (rolls back, etc.) atomically to more than one engine, does that mean TWO (only)? ... or could it happen for three (specifically, innodb, MYSQL_BIN_LOG's binlog, and DB_BIN_LOG's binlog)? The two kinds of binlog files would have the same format (except that one is for only one database). Are they considered two storage engines/plugins? (I notice that the "mysql_declare_plugin(binlog)" declaration is at the bottom of log.cc. Most of the other storage engines' corresponding declarations are in files in the storage subdirectories. If I were to need a "mysql_declare_plugin(db_binlog)" declaration, could it also go at the bottom of log.cc? ... Or (since a DB_BINLOG is AKO [a kind of] MYSQL_BIN_LOG [in the OO sense]), are MYSQL_BIN_LOGs and DB_BINLOGs both kinds of binlog plugins ...?
2. I've diagrammed the binary log event taxonomy--the 24 classes that inherit from Log_event (in log_event.cc/h). That was my main motivation for wanting to inherit from MYSQL_BIN_LOG.
Was that a question? :)
No. Just justification for my desire to simplify by having DB_BINLOG extend the MYSQL_BIN_LOG class. (I realize that I MIGHT have to add "virtual" in front of a method or two--possibly. Hopefully I would need to make no--or minimal-- changes to MYSQL_BIN_LOG.)
3. On the other hand, I wonder if inheriting from MYSQL_BIN_LOG might be overkill for "master only" binlogging--without the slave/replication part of it.
No, I think inheriting from MYSQL_BIN_LOG is correct. At least it's much easier to build on existing binlogging than to implement everything from scratch.
Amen!!
4. What version should I be working on? (We're currently running 5.1.47.) What should I check out of bzr/launchpad? (Is that still what we're doing with the changes in ownership?)
What version you plan to use? If it'll be 5.1 - you have to work in 5.1, use lp:mysql-server/5.1 branch:
bzr branch lp:mysql-server/5.1
We TRY to keep as current as possible (at the company). At some point, we'll need to move to the next major release--whatever it ends up being. (Then there is also the unknown [to me] of MySQL vs. MariaDB ... :-) We'll use bzr branch lp:mysql-server/5.1
5. I see that THD is involved in binlogging. (Several THD methods are defined in log.cc.) Are there any caveats here?
There is a binlog cache (or buffer) in THD that stores binlog events for the current transaction. They all are written to disk together, when a transaction is committed. You need to decide what you want to do when the current database is changed in the middle of a transaction - what binlog the transaction should be written to.
By "the current database is changed", I imagine you refer to the "USE 'database';" command. That's a good question. I THINK we have decided to ignore (for the present, at least), queries (and also transactions?) that involve multiple databases. Because we are doing per-DB binlogging, if multiple databases are involved, we (almost) consider it an anomaly at our ISP ... (I'm thinking that we might need to consider THD's binlog cache only for transactions that involve innodb and no more than one database ...)
6. What files should I use? A new h/cc pair? ... or just add to the hierarchy in log.h/cc?
As you like. New pair of files is cleaner. But it may require you to move some declarations from log.cc to a header file, or to make some static functions extern.
All my prototypes have just expanded what is in log.cc and log.h. (I came VERY close to creating db_binlog.cc/h, however, which I would have done in some of my past--non-MySQL--projects). However, noticing most of the logging hierarchy in the single log.cc/h pair, I decided the implementors must have done it for a reason, and have decided to follow suit--for now, at least.
What I've Implemented So Far (about 13 Prototypes). --------------------------------------------------
(PRELIMINARY NOTE:)
Our initial plan was to call it "Per-User Binlogging". The user here is a cPanel [control panel] user. cPanel's database naming convention is to have all of a user's database names start with the username, followed by an underscore, followed by a distinctive database name. Thus all the commands above start with
<verb> USER ...
Originally most of them could end either "DATABASE <db>" or "USER <user>" (with user meaning cPanel user). Thus, in the original plan the first command above was:
START USER BINLOG FOR (USER <user> | DATABASE <db>)
We have realized that per-database binlogging is all we really need. We imagine that others in our situation might want something similar. Since not everyone uses cPanel, we should forget per[cPanel]-user binlogging, focusing on per-database binlogging.
If every user is a different MySQL account - that it a different row in the mysql.user table - than per-user binlogging is much cleaner and better defined concept.
A statement can affect more than one database (say, UPDATE db1.t1, db2.t2 SET ...), the current database may be changed in the middle of transaction, and so on. On the other hand, any statement or a transaction is always executed in the context of as single user - you cannot change user in the middle of transaction. (*)
(*) changing security context only affect privileges, but not the destination binlog.
Actually (and this is because of the way cPanel does things--I have been "bitten" by this many times!!), each database has a (different) corresponding user in the mysql.user table (!!!). So (for example), if a user signs up with our ISP with a user name of elvispre every one of his (perhaps hundreds/thousands of) databases will have names something like: elvispre_friends elvispre_earlyyears elvispre_hitsongs . . . And (here's the surprising part)--each one of those databases will have a MATCHING ROW IN mysql.user!! So, for us to migrate our customer elvispre to another physical server, we DO (ultimately) need to binlog all of the databases/users that match/are like 'elvispre\_%'; . In some of my earlier prototypes, the START/FLUSH-LOCK/SHOW/UNLOCK/STOP commands have allowed for "like" clauses at the end. Maybe I need to go that route? ALL of my prototypes have had binlogging PER DATABASE. I wonder how difficult/feasible/desirable/useful it would be to create separate binlog files for databases (or users) whose name matches a "like" clause. ... Probably more important: Would this feature be of general interest to the MySQL user community at large? (... and, if so, should it be database- or user- or database_pattern- or user_pattern-based). (My initial idea is that per [single] database/user might be easiest to define/implement??)
Regards, Sergei
Thanks again for your reply!! Weldon