Re: [Maria-discuss] Per-db binlogging
+maria-discuss as they are working in replication too +robert.hodges as tungsten replicator has features that could be used here I am very interested in a subset of this. I want one SQL thread per database on a slave (stream per database on the slave). However I don't need to split the binlog on the master to get that and writing a binlog file per database on the master might complicate the sync_binlog option that keeps the binlog and InnoDB in sync during crash recovery. How will this work in that case? Note that a stream (SQL slave thread) per database on the slave will break rpl_transaction_enabled. Some of us rely on that to keep slaves consistent during crash recovery. I assume you don't use that feature. MySQL has worklogs to use db tables in place of relay-log.info and when that code is production ready, then this is not a problem. On Mon, Sep 13, 2010 at 1:52 PM, Weldon Whipple <weldon@whipple.org> wrote:
Greetings!
My employer has asked me to implement per-DB binlogging for MySQL. I've been working on prototypes for 3-4 weeks now, and decided it's time to post to this list to ask for comments, suggestions, condolences, etc. (I've been a "lurker" for quite awhile. I hope this post isn't too far off the mark) The assignment is *only* for master-side binlogging. (It doesn't require a slave/replication-side implementation.)
Why I Want to Do It. -------------------
I work for an ISP that hosts (last I heard--I HOPE I'm not lying) over 2 million domains. Customers are allowed to have (almost) unlimited MySQL databases. Migrating an account from one physical server to another is a daunting task. (A typical server has easily 2000+ [sometimes far more] databases.) Global mysqld binlogging isn't feasible when we want to migrate (for example) a single account to another server.
Our proposed scenario goes something like this:
1. START USER BINLOG FOR DATABASE <db> 2. FLUSH USER TABLES WITH READ LOCK FOR DATABASE <db> 3. SHOW USER BINLOG STATUS [WHERE DATABASE LIKE <pattern>]
SHOW output might look something like: ================================= mysql> show user binlog status; +----------------------+----------+---------------------------------+------------+ | Database | User | Binlog_file | Binlog_pos | +----------------------+----------+---------------------------------+------------+ | booklibs | wwhipple | booklibs-bin.000003 | 248 | | charmins_gravestones | charmins | charmins_gravestones-bin.000001 | 4 | +----------------------+----------+---------------------------------+------------+ 2 rows in set (0.01 sec)
The "where" clause also works:
mysql> show user binlog status where user like 'charmins'; +----------------------+----------+---------------------------------+------------+ | Database | User | Binlog_file | Binlog_pos | +----------------------+----------+---------------------------------+------------+ | charmins_gravestones | charmins | charmins_gravestones-bin.000001 | 4 | +----------------------+----------+---------------------------------+------------+ 1 row in set (0.01 sec) =================================
At this point the migration tools note the binlog coordinates and start copying the databases to the new server. (For myisam tables, the tools writers want to just copy [scp?] the files. For innodb, they will use something like mysqldump.)
(Note: We might want to "lose" the "User" column in the above table] in our recent designs.)
Then the sequence of MySQL commands resumes:
4. UNLOCK USER BINLOG FOR DATABASE <db>
Finally, when migration is complete:
5. STOP USER BINLOG FOR DATABASE <db>
A high percentage of users (infrequent bloggers, etc.) go several days (sometimes even weeks or months) without updating their databases. For them--after migrating the files to the new server, a final check of the binlog coordinates on the old box can verify that nothing has changed.
If it HAS changed, migration can copy the binlog file(s) to the new box and use the mysqlbinlog command (specifying the starting coordinates) to bring the new database in sync with the old one.
(This scenario is probably oversimplified, ignoring things like DNS caching, TTL, etc., etc. Hopefully it doesn't include too many lies.)
How I Plan to Do It. -------------------
I've noticed that:
1. The class MYSQL_BIN_LOG seems to be involved in most binlogging 2. MYSQL_BIN_LOG inherits from TC_LOG and MYSQL_LOG. It is created in mysqld.cc at startup.
My current partial implementation has:
1. Class DB_BIN_LOG inheriting from MYSQL_BIN_LOG, adding members like db_name db_binlog_name db_index_name and methods to access them, etc.
2. Singleton class USER_BIN_LOG_MGR. (Should probably be DB_BIN_LOG_MGR), which maintains a db_binlog_hash of information about each DB being binlogged. A DB_BIN_LOG_ENTRY (in the hash) contains: dbname locked (bool)--has it been locked by FLUSH ... TABLES WITH READ LOCK? binlog file binlog_pos is_active a pointer to a DB_BIN_LOG instance
USER_BINLOG_MGR has a rw lock to protect access to its hash of currently binlogging db's. (Most accesses will probably be read-only?)
Is it plausible to identify all places where a binlog addition might occur? In those places, if the singleton USER_BIN_LOG_MRG exists, ask it if the current DB is being binlogged. If so, send the request to the DB_BIN_LOG instance.
Questions I Have. ----------------
1. I notice that binlog is in the list of engines and plugins (displayed by SHOW ...). The bottom of log.cc has "mysql_declare_plugin(binlog)". Do I need one for db_plugin? (probably not?) 2. I've diagrammed the binary log event taxonomy--the 24 classes that inherit from Log_event (in log_event.cc/h). That was my main motivation for wanting to inherit from MYSQL_BIN_LOG. 3. On the other hand, I wonder if inheriting from MYSQL_BIN_LOG might be overkill for "master only" binlogging--without the slave/replication part of it. 4. What version should I be working on? (We're currently running 5.1.47.) What should I check out of bzr/launchpad? (Is that still what we're doing with the changes in ownership?) 5. I see that THD is involved in binlogging. (Several THD methods are defined in log.cc.) Are there any caveats here? 6. What files should I use? A new h/cc pair? ... or just add to the hierarchy in log.h/cc?
In an ideal world, it would be nice to inherit polymorphically from MYSQL_BIN_LOG, overriding methods that set the names of the binlog file and index file (etc.) then just call the methods and have it do the "right thing." On the other hand, I get the impression that I might be oversimplifying.
I welcome your advice!!
What I've Implemented So Far (about 13 Prototypes). --------------------------------------------------
(PRELIMINARY NOTE:)
Our initial plan was to call it "Per-User Binlogging". The user here is a cPanel [control panel] user. cPanel's database naming convention is to have all of a user's database names start with the username, followed by an underscore, followed by a distinctive database name. Thus all the commands above start with
<verb> USER ...
Originally most of them could end either "DATABASE <db>" or "USER <user>" (with user meaning cPanel user). Thus, in the original plan the first command above was:
START USER BINLOG FOR (USER <user> | DATABASE <db>)
We have realized that per-database binlogging is all we really need. We imagine that others in our situation might want something similar. Since not everyone uses cPanel, we should forget per[cPanel]-user binlogging, focusing on per-database binlogging.
I have several versions of lex and bison/yacc files (sql_yacc.yy, lex.h) to implement the above (and earlier variants). The recent versions avoid increasing the %expect'ed 169 shift/reduce conflicts specified in sql_yacc.yy.
(END OF PRELIMINARY NOTE)
I currently have 13 different Subversion branches with different (partial) implementations.
1. All have some lex/bison grammer defined. (I'm not completely satisfied with any of them.)
2. Per-db binlogging is enabled by two my.cnf (or command-line) options:
user_binlog_dir=/full/directory/path user_binlog_persist_file=user_binlog.info
Our servers have fast solid-state drives that we will use for per-db binlogging. We want to specify a separate directory to store all the per-db binlogs
The persist_file stores information about the current state of db-binlogging--to survive a restart.
Just about every instance of "user_binlog" in this note should probably be changed to "db_binlog" ... (???)
3. I have added cases in sql_parse.cc for
SQLCOM_START_USER_BINLOG_DB: SQLCOM_FLUSH_AND_LOCK_DB: SQLCOM_SHOW_USER_BINLOG_STATUS: SQLCOM_UNLOCK_DB: SQLCOM_STOP_USER_BINLOG_DB:
(as well as some "USER" variants like SQLCOM_START_USER_BINLOG_USER:, which should probably abandoned.)
They should probably be changed to something like:
SQLCOM_START_DB_BINLOG: SQLCOM_FLUSH_AND_LOCK_DB: SQLCOM_SHOW_DB_BINLOG_STATUS: SQLCOM_UNLOCK_DB: SQLCOM_STOP_DB_BINLOG:
-----------------
Now that you've read it, please feel free to throw darts, etc. At this point I'm willing to discard everytning and start over. (In the worst case--if everyone on this list thinks per-db binlogging is a terrible idea--I guess I'll be forced to create our own patch [sigh].)
On the other hand, I would love to help contribute to this effort. (If per-db binlogging is already underway, let me know, and I'll likely help.) Thanks in advance for feedback/suggestions/help. Feel free to contact me privately if you want to.
Weldon Whipple weldon@whipple.org
-- MySQL Internals Mailing List For list archives: http://lists.mysql.com/internals To unsubscribe: http://lists.mysql.com/internals?unsub=mdcallag@gmail.com
-- Mark Callaghan mdcallag@gmail.com
On Tue, Sep 14, 2010 at 3:02 AM, MARK CALLAGHAN <mdcallag@gmail.com> wrote:
+maria-discuss as they are working in replication too
To be exact, maria-developers@ is the counterpart to internals@mysql, although everyone should be reading maria-discuss@ too. I'm now leaving both in CC.
On Mon, Sep 13, 2010 at 1:52 PM, Weldon Whipple <weldon@whipple.org> wrote:
Greetings!
My employer has asked me to implement per-DB binlogging for MySQL. I've been working on prototypes for 3-4 weeks now, and decided it's time to post to this list to ask for comments, suggestions, condolences, etc. (I've been a "lurker" for quite awhile. I hope this post isn't too far off the mark) The assignment is *only* for master-side binlogging. (It doesn't require a slave/replication-side implementation.)
Why I Want to Do It. -------------------
I work for an ISP that hosts (last I heard--I HOPE I'm not lying) over 2 million domains. Customers are allowed to have (almost) unlimited MySQL databases. Migrating an account from one physical server to another is a daunting task. (A typical server has easily 2000+ [sometimes far more] databases.) Global mysqld binlogging isn't feasible when we want to migrate (for example) a single account to another server.
Our proposed scenario goes something like this: <cut>
Do I understand correctly, that in the normal case you do not run with binlog on at all? You would only turn on this kind of per-database binlog when needed? If that would be the case, then I don't see why --binlog-do-db wouldn't already do exactly what you want, except that you'd require a restart to change it. So all you need to implement would be the ability to set it as a system variable and not just in the conf file. (http://dev.mysql.com/doc/refman/5.1/en/replication-options-binary-log.html#o...) Alternatively, if you are also running a "global" binlog, then as Mark suggested, don't write additional per database binlogs, rather get your information from the "global" binlog and filter out what should be replicated instead by using --replicate-do-db. Again, it is not currently possible to set this on/off without restart. (http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#option...) Finally, for anyone wishing to implement their own binlogging and replication, I should advertise our work in MariaDB on a generic "replication plugin API". Using this you could implement your own replication plugin that would do precisely what you want, but you could still leave the original replication code untouched. (You could of course "fork" it to make your own plugin if you wanted.) Kristian Nielsen in CC is working on that, and you can read about it here: http://askmonty.org/wiki/ReplicationProject http://askmonty.org/worklog/?tid=107 http://askmonty.org/worklog/?tid=120 henrik PS: It is not exactly the same use case, but also see mine and Igor's presentation from last years MySQL user conference. http://assets.en.oreilly.com/1/event/36/Valuable%20MariaDB%20Features%20That... ...from slide 10 onwards. Summary: You can use mysqlbinlog instead of replication and this allows all kinds of scriptable hacks to occur between master and slave. henrik -- henrik.ingo@avoinelama.fi +358-40-5697354 www.openlife.cc
Henrik, Thanks for your response! (See my replies below.) On Wed, Sep 15, 2010 at 2:33 PM, Henrik Ingo <henrik.ingo@avoinelama.fi> wrote:
On Tue, Sep 14, 2010 at 3:02 AM, MARK CALLAGHAN <mdcallag@gmail.com> wrote:
+maria-discuss as they are working in replication too
To be exact, maria-developers@ is the counterpart to internals@mysql, although everyone should be reading maria-discuss@ too. I'm now leaving both in CC.
I'm replying leaving all your addressees in my reply to this note. (I hope there aren't too many duplicates!) <snip/>
Do I understand correctly, that in the normal case you do not run with binlog on at all?
Yes. (We have tried it a few times in the past for load balancing, etc., but aren't using it present.)
You would only turn on this kind of per-database binlog when needed?
Yes--when we want to migrate a user's databases from one server to another.
If that would be the case, then I don't see why --binlog-do-db wouldn't already do exactly what you want, except that you'd require a restart to change it. So all you need to implement would be the ability to set it as a system variable and not just in the conf file. (http://dev.mysql.com/doc/refman/5.1/en/replication-options-binary-log.html#o...)
That is almost (precisely) what we want! I have actually written some local hacks where my.cnf and command-line options are also settable as system variables. I could probably do that easily. The only difference is that we want multiple output binlog files--one per database. It is common for our users (who are serviced on 100s or 1000s of physical servers in our data centers) to want/need to migrate to a different physical server--to any one of the other servers in our data centers. In that case, it seems to make the most sense to have separate binlog files, which we can copy to the destination servers if needed. If--in the period between dumping the databases on the source server and importing it onto the new server (and just before going online on the new server), we see that the binlog coordinates for a given database have changed on the source server, we can copy the binlog file(s) from the source to the destination server and run mysqlbinlog (specifying the same coordinates displayed before the initial dump) to bring the database up-to-date, then bring the customer online on the new server (and offline on the old server. On a given day, different customers on one server will often migrate to many different servers. It would be impractical to have many slave servers listening for changes on that one "source" server. Also, any one destination server might need to listen to several source masters at any given time. (This could lead to a combinatoric explosion--or at least a difficult situation for administrators/migration tools.) I'm jumping to a later line in your reply--the one about the mysqlbinlog command. Because of the large number of combinations of migration sources and destinations, we are looking at using the mysqlbinlog command on the destination servers: If a second check of the source server's single-database-binlog coordinates (filename/offset) shows a change, we will copy what binlog file(s) are necessary from the old to the new server and run mysqlbinlog--just before "flipping the switch" to bring the customer online on the new server/offline at the old server. Perhaps a different approach might be to modify (if enabled by some option) the binlog-do-db behavior (if controlled by a global variable) to open a database-specific binlog file. I've noticed in a later post by Sergei that "FLUSH TABLES WITH READ LOCK FOR DATABASE dbname" would be difficult to implement. If we decide to skip that command (in favor of a global version?) in our first iteration, we would be left with: START, SHOW coordinates, STOP--all for a single database--to a single binlog file (set) and index. I'm guessing that "UNLOCK DATABASE dbname for BINLOG" would be as difficult as the FLUSH/LOCK command ... If that is the case: What is the overhead/time involved for a mysqld with thousands of databases to run the global "FLUSH TABLES WITH READ LOCK" and "UNLOCK TABLES" whenever we want to dump a database on the source server before copying (scp) the data to the new server ...?
Alternatively, if you are also running a "global" binlog, then as Mark suggested, don't write additional per database binlogs, rather get your information from the "global" binlog and filter out what should be replicated instead by using --replicate-do-db. Again, it is not currently possible to set this on/off without restart. (http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#option...)
Finally, for anyone wishing to implement their own binlogging and replication, I should advertise our work in MariaDB on a generic "replication plugin API". Using this you could implement your own replication plugin that would do precisely what you want, but you could still leave the original replication code untouched. (You could of course "fork" it to make your own plugin if you wanted.) Kristian Nielsen in CC is working on that, and you can read about it here: http://askmonty.org/wiki/ReplicationProject http://askmonty.org/worklog/?tid=107 http://askmonty.org/worklog/?tid=120
I will definitely look at the above links.
henrik PS: It is not exactly the same use case, but also see mine and Igor's presentation from last years MySQL user conference. http://assets.en.oreilly.com/1/event/36/Valuable%20MariaDB%20Features%20That... ...from slide 10 onwards. Summary: You can use mysqlbinlog instead of replication and this allows all kinds of scriptable hacks to occur between master and slave.
I will read about mysqlbinlog from your slides!
henrik
If I'm overlooking some brain-dead easy solution to this migration requirement, feel free to enlighten me.!! Thanks again! Weldon
participants (3)
-
Henrik Ingo
-
MARK CALLAGHAN
-
Weldon Whipple