Hi, Alex! Continuing the old discussion... On Jan 22, Alex Yurchenko wrote:
1) It is time to drop MASTER/SLAVE mentality. This has nothing to do with replication per se. For example multi-master Galera cluster is turned into master-slave simply by directing all writing transactions to a single node. Without a single change in nodes' configuration, let alone our replication API. So master-slave is purely a load balancer thing - the node that receives writes IS the master even if the slaves think otherwise.
I may still use words "master" and "slave" below, in the sense that the part of the code that takes the changes generated by local clients and sends them out can be called "master" and the part of the code that receives them and applies can be called "slave". Both can be active on the same node though.
2) It is time to drop SYNCHRONOUS/ASYNCHRONOUS mentality. Although Galera cluster currently supports only synchronous operation, it can be turned into asynchronous with rather small changes in the code - again without any changes to API. This is merely a quality of replication engine.
Agree.
So when refactoring replication code and API we suggest to think of replication as of redundancy service and establish a general API for such service that can be utilized by different implementations with different qualities of service. In other words - make a whole replication system a plugin (like storage engines are), not only some measly filters.
Ok, here I describe a possible model of what it can look like in the server: * there are replication _events_ - they represent changes to the data, like creation of a table, or updating of a row. * there are event _generators_ or _producers_ - facilities that generate events, for example "SBR producer" generates a stream of events with the SQL statements - for a statement-based replication. There can also be "RBR producer", or, for example, "MyISAM physical producer" - that generates events in terms of pwrite() calls. * there are event _consumers_ - they connect to producers and consume the generated events. For example, a filter, such as that only allows changes to a certain table to be replicated, is both a consumer and a producer of events. * when events are sent to slaves - it's again just a pair of producer/consumer - events on the master dissapear in the consumer, events on the slave come out from a producer. * events can be _nested_ - one INSERT ... SELECT statement is one SBR event, but it corresponds to many RBR events, and every RBR event may correpond to many "MyISAM pwrite()" events. * not everything can be replicated at every level, for example table creation cannot be replicated row-based, InnoDB changes cannot be replicated with "MyISAM pwrite()" events * it is up to the event generation facility to make sure its stream of events is complete. It is implemented by fetching events from the upper level: for example, RBR producer connects - as a consumer - to the SBR producer, and when there are SBR events without nested RBR events it simply reads the corresponding SBR events and sends it out. * a consumer may know the event format and look at the data fields, or it may not. For example, a filter that adds checksums to events or a consumer that sends events to slaves do not need to care about event format. But a "final consumer" - the one that ultimately applies event on the slave side - apparently should know how to parse the event. * there's no explicit global transaction ID here, but I presume there can be a filter that adds it to events. That would work, as long as replication decides on the commit order (which is does, even now in MySQL/MariaDB). this model seems to allow both native MySQL replication - sbr, rbr, and mixed, with exactly the same protocol on the wire - and different extensions, like semysync or fully synchronous replication, heterogeneous replication, arbitrary transport protocols, and so on. It looks like it can be completely compatible with MySQL replication if necessary or use something absolutely different - depending on what plugins are loaded and how they are connected. but the model itself has no notion of "master node" or "slave node", synchronous or asynchronous, binlog, MySQL protocol, relay log, or even SBR/RBR/MIXED modes. Regards, Sergei