Hi, Kristian!
API, eliminating lots of class definitions and accessor functions. Though arguably it wouldn't really simplify the API, as the complexity would just be in understanding the THD class.
For now, the API is proposed without exposing the THD class. (Similar encapsulation could be added in actual implementation to also not expose TABLE and similar classes).
completely agree
Ok, so some follow up questions:
1. Do I understand correctly that you agree that the API should also encapsulate TABLE and similar classes? These _are_ exposed to storage engines as far as I can see.
I think it's ok to use TABLE and Field as storage engines are using them. It would be good to encapsulate them, of course, but I'd say there's no need to try to do it at all costs.
2. If TABLE and so on should be encapsulated, there will be the issue of having iterators to run over columns, etc. Do we already have standard classes for this that could be used? Or should I do this modelled using the iterators of the Stardard C++ library, for example?
We have List and an iterator over it. Alternatively, you can return an array and let the caller iterate it any way it wants.
(I would like to make the new API fit in as well as possible with the existing MySQL/MariaDB code, which you know much better).
A consumer is implented as a virtual class (interface). There is one virtual function for every event that can be received. A consumer would derive from
hm. This part I don't understand. How would that work ? A consumer want to see a uniform stream of events, perhaps for sending them to a slave. Why would you need different consimers and different methods for different events ?
I'd just have one method, receive_event(rpl_event_base *)
Ok, so do I understand you correctly that class rpl_event_base would have a type field, and the consumer could then down-cast to the appropriate specific event class based on the type?
receive_event(const rpl_event_base *generic_event) { switch (generic_event->type) { case rpl_event_base::RPL_EVENT_STATEMENT_QUERY: const rpl_event_statement_query *ev= static_cast<const rpl_event_statement_query *>(generic_event); do_stuff(ev->get_query_string(), ...); break; case rpl_event_base::RPL_EVENT_ROW_UPDATE: const rpl_event_row_update *ev= static_cast<const rpl_event_row_update *>(generic_event); do_stuff(ev->get_after_image(), ...); break; ... } }
I have always disliked having such type field and upcasting. So I tried to make an API where it was not needed. Like this:
class my_event_consumer { int stmt_query(const rpl_event_statement_query *ev) { do_stuff(ev->get_query_string(), ...); } int row_update(const rpl_event_row_update *ev) { do_stuff(ev->get_after_image(), ...); } ... };
Okay, now I see what you mean. I don't like downcasting either. On the other hand, I don't want to force plugins that work on an event as a whole to implement methods for every particular type of an event. It may be possible to do both. Like - virtual methods for every event type, as you proposed, but not abstract - the default implementation calls receive_event() - a generic one. And a plugin can either implement a family of receive_event* methods or a generic. But if the above wouldn't work and we'll have to choose, I'd prefer a simpler interface with one generic receive_event().
One generator can be stacked on top of another. This means that a generator on top (for example row-based events) will handle some events itself (eg. non-deterministic update in mixed-mode binlogging). Other events that it does not want to or cannot handle (for example deterministic delete or DDL) will be defered to the generator below (for example statement-based events).
There's a problem with this idea. Say, Event B is nested in Event A:
... ... |<- Event A ... .. .. ->| .. .. .. * * * * * |<- Event B ->| * * * *
This is fine. But what about
... ... |<- Event A ... ->| .. .. .. * * * * * |<- Event B ->| * * * *
In the latter case no event is nested in the other, and no level can simply dever to the other.
I don't know a solution for this, I'm just hoping the above situation is impossible. At least, I could not find an example of "overlapping" events.
Another way of thinking about this is that we have one layer above handling (or not handling) an event that can be generated below. ... So one case where this becomes a problem is if we have a multi-table update where one table is PBXT and another is not, and we are using PBXT engine-level replication on top of statement-based replication. In this case, one half of the statement-based event is handled by the layer above, but the other is not. So we cannot deal with this situation.
On the opposite, this is quite easy. Even a CREATE ... SELECT is a mix of statement-based and row-based. A simple solution would be to replicate it completely statement-based - that is, to discard the row-based part of the event. We can do that, because statement level description of the event is sufficient - the row based evetn is completely nested within a statement-based one (other, more complex solutions are possible too). I was describing a case when events overlap, but neither one is completely nested within the other. This case I know no solution for, but I hope it is never possible in practice. Regards, Sergei