Hi, Kristian! On Jun 24, Kristian Nielsen wrote:
----------------------------------------------------------------------- High-Level Specification
Generators and consumbers -------------------------
We have the two concepts:
1. Event _generators_, that produce events describing all changes to data in a server.
2. Event consumers, that receive such events and use them in various ways.
Examples of event generators is execution of SQL statements, which generates events like those used for statement-based replication. Another example is PBXT engine-level replication.
An example of an event consumer is the writing of the binlog on a master.
Some event generators are not really plugins. Rather, there are specific points in the server where events are generated. However, a generator can be part of a plugin, for example a PBXT engine-level replication event generator would be part of the PBXT storage engine plugin. And for example we could write a filter plugin, which would be stacked on top of an existing generator and provide the same event types and interfaces, but filtered in some way (for example by removing certain events on the master side, or by re-writing events in certain ways).
Event consumers on the other hand could be a plugin.
One generator can be stacked on top of another. This means that a generator on top (for example row-based events) will handle some events itself (eg. non-deterministic update in mixed-mode binlogging). Other events that it does not want to or cannot handle (for example deterministic delete or DDL) will be defered to the generator below (for example statement-based events).
There's a problem with this idea. Say, Event B is nested in Event A: ... ... |<- Event A ... .. .. ->| .. .. .. * * * * * |<- Event B ->| * * * * This is fine. But what about ... ... |<- Event A ... ->| .. .. .. * * * * * |<- Event B ->| * * * * In the latter case no event is nested in the other, and no level can simply dever to the other. I don't know a solution for this, I'm just hoping the above situation is impossible. At least, I could not find an example of "overlapping" events.
Default materialisation format ------------------------------ While the proposed API doesn't _require_ materialisation, we can still think about providing the _option_ for built-in materialisation. This could be useful if such materialisation is made suitable for transport to a different server (eg. no endian-dependance etc). If there is a facility for such materialisation built-in to the API, it becomes possible to write something like a generic binlog plugin or generic network transport plugin. This would be really useful for eg. PBXT engine-level replication, as it could be implemented without having to re-invent a binlog format.
I added in the proposed API a simple facility to materialise every event as a string of bytes. To use this, I still need to add a suitable facility to de-materialise the event.
Couldn't that be done not in the API or generator, but as a filter somewhere up the chain ?
So I think maybe it is better to add such a generic materialisation facility on top of the basic event generator API.
Ah, right.
Encapsulation -------------
Another fundamental question about the design is the level of encapsulation used for the API.
At the implementation level, a lot of the work is basically to pull out all of the needed information from the THD object/context. The API I propose tries to _not_ expose the THD to consumers. Instead it provides accessor functions for all the bits and pieces relevant to
of course
each replication event, while the event class itself likely will be more or less just an encapsulated THD.
So an alternative would be to have a generic event that was just (type, THD). Then consumers could just pull out whatever information they want from the THD. The THD implementation is already exposed to storage engines. This would of course greatly reduce the size of the
no, it's not. THD is not exposed to engines (unless they define MYSQL_SERVER but then it's not our problem), they use accessor functions.
API, eliminating lots of class definitions and accessor functions. Though arguably it wouldn't really simplify the API, as the complexity would just be in understanding the THD class.
For now, the API is proposed without exposing the THD class. (Similar encapsulation could be added in actual implementation to also not expose TABLE and similar classes).
completely agree
----------------------------------------------------------------------- Low-Level Design
A consumer is implented as a virtual class (interface). There is one virtual function for every event that can be received. A consumer would derive from
hm. This part I don't understand. How would that work ? A consumer want to see a uniform stream of events, perhaps for sending them to a slave. Why would you need different consimers and different methods for different events ? I'd just have one method, receive_event(rpl_event_base *)
the base class and override methods for the events it wants to receive.
There are methods for a consumer to register itself to receive events from each generator. I still need to find a way for a consumer in one plugin to register itself with a generator implemented in another plugin (eg. PBXT engine-level replication). I also need to add a way for consumers to de-register themselves.
Let's say that all generators are hard-coded and statically compiled in. You can think about how to dynamically register them (e.g. pbxt) later.
The current design has consumer callbacks return 0 for success and error code otherwise. I still need to think more about whether this is useful (ie. what is the semantics of returning an error from a consumer callback).
Each event passed to consumers is defined as a class with public accessor methods to a private context (which is mostly the THD).
My intension is to make all events passed around const, so that the same event can be passed to each of multiple registered consumers (and to emphasise that consumers do not have the ability to modify events). It still needs to be seen whether that const-ness will be feasible in practise without very heavy modification/constification of exiting code.
----------------------------------------------------------------------- /* Virtual base class for generated replication events.
This is the parent of events generated from all kinds of generators. Only child classes can be instantiated.
This class can be used by code that wants to treat events in a generic way, without any knowledge of event details. I still need to decide whether such generic code is sensible.
sure it is. write event to binlog. send it to a slave. add a checksum, encrypt, compress - all these consumers can treat an event as an opaque stream of bytes.
*/ class rpl_event_base { ... int materialise(int (*writer)(uchar *data, size_t len, void *context)) const; ... Also, I still need to think about whether it is at all useful to be able to generically materialise an event at this level. It may be that any binlog/transport will in any case need to undertand more of the format of events, so that such materialisation/transport is better done at a different layer.
Right, I'm doubful too. Say, to materialize a statement level event you need to know what exactly bits of the context you want to include. When replicating to MariaDB it's one set, when repicating to identically configured MariaDB of the same version it's another set, and when replicating to, say, DB2, it's probably a different (larger) set.
};
/* The global transaction id is unique cross-server.
It can be used to identify the position from which to start a slave replicating from a master.
This global ID is only available once the transaction is decided to commit by the TC manager / primary redundancy service. This TC also allocates the ID and decides the exact semantics (can there be gaps, etc); however the format is fixed (cluster_id, running_counter).
uhm. XID format is defined by the XA standard. An XID consists of - format ID (unsigned long) - global transaction ID - up to 64 bytes - branch qualifier - up to 64 bytes as your transaction id is smaller, you will need to consider XID a part of the "context" - in cases where XID was generated externally. Same about binlog position - which is a "transaction id" in the MySQL replication. It doesn't fit into your scheme, so it will have to be a part of the context. And unless the redundancy service will be allowed to ignore your transaction ids, MySQL native replication will not fit into the API. Regards, Sergei