Robert Hodges <robert.hodges@continuent.com> writes:
First of all, we Continuent Tungsten folk have a certain set of problems we solve with replication. Here are the key use cases:
3. Replicating heterogeneously between MySQL and other database like Oracle. This requires the ability to filter and transform data easily. Another use case of heterogeneous replication is copying across databases of the same for application upgrades and migration between database versions.
Yes, this is quite interesting, and somewhat different from normal MySQL->MySQL replication.
4. Ensuring full data protection such that data, once committed, are not lost or corrupted. This includes replicating [semi-]synchronously to slaves, performing consistency checks on data, performing point-in-time restoration of data (e.g., using backups + a change log), etc.
And also reliable crash recovery. Which I think is not there in 5.1, and in 5.5 is implemented in a way that I fear comes with too high a performance cost for many of the applications that need it the most (too many fsync()s).
how Tungsten works). Here are some features that would make it easier to work with the existing replication implementation:
1.) Synchronous replication. It's enough if replication slaves can hold up commit notices on the master. The MySQL 5.5 features look like a good start but I have not started the implementation and have therefore not hit the sharp corners.
As I understand it, synchronous replication based on current 5.5 features would first start commit on master, then send binlog to slave, then run and commit transaction on slave, then finish commit on master. So transactions-per-second rate would be quite limited. But of course there are many applications where load is light and this would be useful.
2.) CRCs. CRCs and other built-in features like native consistency checks seem like the most glaring omission in the current binlog implementation. It's difficult to ensure correct operation without them, and indeed I'm confident many MySQL bugs can be traced back to the lack of features to make data corruption more easily visible.
Yes. (If there was a good SQL-level way of matching binlog position/transaction ID with MVCC snapshot version, consistency checks on tables could be implemented very well and flexible from outside the server and replication framework. This would be a very nice feature to have. Though not without problems to implement...)
3.) Self-contained transactions with adequate metadata. Row replication does not supply column names or keys which makes SQL generation difficult especially when there are differences in master and slave schema. Also,
Right... but do you suggest putting the entire table definition of every table into every transaction? Sounds a bit bloated perhaps? The row-level binlogging in MySQL is based on column index, not column name. But I understand that the ability to generate SQL (which is based on column name rather than index) would be nice.
session variables like FOREIGN_KEY_CHECKS affect DDL but are not embedded in the transaction where they are used. Finally, character set support is a
Can you elaborate? Wouldn't this also cause bugs in MySQL replication itself?
little scary based on my one experience in that area. You have to read code to get master lists of character sets; semantics are very unclear.
What are the issues with character set?
and I need to discuss that over a beer in Helsinki.) Also, transactions IDs need an unambiguous source ID or epoch number encoded in the ID so that you can detect diverging serialization histories. This nasty little problem that can lead to big accidents in the field.
Can you elaborate? I don't understand exactly what a "source ID" or "epoch number" would be. Can you give an example?
In fact, you could summarize 2-6 as making the binlog (whether written to disk or not) into a consistent "database" that you can move elsewhere and apply without having to add extra metadata, such as global IDs or table column names. Currently we have to regenerate the log which is a huge waste of resources and also have to depend external information to derive schema definitions.
Finally, since there is already talk about rewriting replication from scratch, I would like to point out for the sake of discussion a few things that the current MySQL replication in my opinion does well. Any future system must match them.
4.) Robust. There is no lack of problems with MySQL replication but realistically any new implementations will have a high bar to function equally well. Plugin approaches like that used by Drizzle are very flexible but they also tend to have a kick-the-can-down-the-road effect in that it's up to plugins to provide a robust implementation. This in turn takes a long time to do well unless plugins cut down the problem size, for example by omitting statement replication.
Yes. I think this is a very good point. Eg. many of your points could be answered merely by making MySQL binlogging pluggable and let Tungsten (and everyone else) just implement their own logging to fit their particular purpose. But there is also a lot to be said for providing a single really useful binlog implementation. It does not sound appealing for users to have 3 or 4 different binlogs on their systems, each supporting a particular plugin (we already have two with the engines internal transactional log, which is arguably one too many).
2.) Fast. MySQL replication really rips as long as you don't have slow statements that block application on slaves or don't hit problems like the infamous InnoDB broken group commit bug (#13669) reported by Peter Zaitsev.
Well, this may be true for in-memory working sets. But if you have a larger system that does not fit in main memory and is bottlenecked by the performance of the disk system, the single-threaded slave really hurts. It makes it really hard to scale up on the disk I/O on the slave. Everybody who is into larger systems seem to mention this.
* Logical replication based on an enhanced form of today's MySQL replication with substantial clean-up of existing code, simplification/enhancement of binlog event formats, and other features that we can readily agree upon in short order.
Yes. So one could imagine making the pluggable replication, and moving the existing MySQL binlog into a plugin for backwards compatibility. Then we could write another plugin with enhanced, not backward compatible binlog containing these enhancements in a more extensible format (eg. column names in transactions would hurt no-one if they could be easily switched on or off). (Not sure if current MySQL replication could be suitable extended without a separate plugin, but if so then so much the better). - Kristian.