Hi Brian, On 1/27/10 12:51 PM PST, "Brian Evans" <grknight@scent-team.com> wrote:
I was reading over the replication discussion on maria-developers I thought I'd share some thoughts that are not so technical as I do not understand the back end that well.
Our system is currently a master-master WAN replication setup, each writing to a different DB with an offline process to do copies. Our databases extensively uses auto-increments on tables (more than necessary, but that's another story).
Would future developments allow synchronous replication to occur on a single database when the WAN link has a single point of failure(SPoF)? How might an auto-increment system handle such a failure?
This SPoF is our network link that can be a transient problem out of our control. Both sides may need to keep inserting data during this transient issue. This is usually not an issue 99% of the time, but it's better to prepare for downtime than trying to fix data errors in that 1%.
Given the requirement you just stated synchronous replication is not an option because your database will become unavailable if the WAN connection goes down. This is not a limitation of current implementations. It is a consequence of the problem of distributed consensus, which is like the halting problem for distributed systems and means that distributed databases cannot be guaranteed to have the same contents if one of them is allowed to fail. Instead there are workarounds. 1.) Have masters in both locations as you seem to describe and replicate to backup copies on the other site. 2.) Use auto-increment-offset as described by Arjen up-thread. However, this does not protect from other types of conflicts such as adding a child row with a particular FK on one site while deleting its parent row and hence PK on the other site. It just handles inserts. 3.) Rewrite your data model to eliminate conflicts. In that case you are better off dispensing with auto-increment entirely. 4.) Route updates to a single master for processing and then replicate them back to local slaves. In this case you must enqueue updates on remote sites (i.e., accept them provisionally) during an outage. They don't commit fully until the link is back up. This usually involves not just a data model rewrite but re-architecting the application. What's wrong with the current approach you are using? Unless I'm mistaken you don't have a problem with outages now because you can continue to update the masters and replicate once the link resumes. If you are looking for full data consistency across sites that problem is not solvable but others may be. Cheers, Robert
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp