Re: [Maria-developers] Ideas for improving MariaDB/MySQL replication

23 Mar 2010

      On Tue, 23 Mar 2010 10:12:53 +0200, Henrik Ingo
<henrik.ingo@avoinelama.fi>
wrote:
...
Meta discussion first, replication discussion below :-)
...
...
<cut>
...
So those are the requirements I could derive from having NDB use our
to-be-implemented API. My conclusion from the above is that we should
consider adding to the model the concept of a transaction group,
which:
 -> the engine (or MariaDB server, for multi-engine transactions?) MAY
provide information of which transactions had been committed within
the same group.
 -> If such information was provided, a redundancy service MAY process
transactions inside a group in parallel or out of order, but MUST make
sure that all transactions in transaction group G1 are
processed/committed before the first transaction in G2 is
processed/comitted.
Well, that's a pretty cool concept. One way to call it is "controlled
eventual consistency". But does redundancy service have to know about
it?
If the redundancy service does not know about it, how would the
information be transmitted by it??? For instance take the example of
the binlog, which is a redundancy service in this model. If it
supported this information (which it MAY do), it of course has to save
it in some format in the binlog file.
...
First of all, these groups are just superpositions of individual atomic
transactions. That is, this CAN be implemented on top of the current
model.
Yes, this is the intent.
...
Secondly, transaction applying is done by the engine, so the engine or
the
server HAS to have a support for this, both on the master and on the
slave
side. So why not keep the redundancy service API free from that at all?
Consider this scheme:
Database Server | Redundancy Service
(database data) | (redundancy information)
               |
       Redundancy API
The task of redundancy service is to store and provide redundancy
information that can be used in restoring the database to a desired
state.
Keeping the information and using it - two different things. The
...
...
of
API is to separate one part of the program from the logic of another.
So
I'd keep the model and the API as simple as free from the server
...
...
as
possible.
What it means here: redundancy service stores atomic database changes
in
a
certain order and it guarantees that it will return these changes in
I guess we can consider meta-discussion closed for now unless someone
wants to add to it. I'm content ;)

purpose
details
the
...
...
same order. This is sufficient to restore the database to any state it
had.
It is up to the server in what order it will apply these changes and if
it
wants to skip some states. (This assumes that the changesets are opaque
to
redundancy service and the server can include whatever information it
wants
in them, including ordering prefixes)
Ok, this is an interesting distinction you make.
So in current MySQL/MariaDB, one place where transactions are applied
to a replica is the slave SQL thread. Conceptually I've always thought
of this as "part of replication code". You propose here that this
should be a common module on the MariaDB server side of the API,
rather than part of each redundancy service.
Yes.
...
I guess this may make
sense.
Well, it is of course a matter of debate, but not all of the
redundancy-related code has to be encompassed by the redundancy API. The
main purpose of API is to hide implementation details and it goes both
ways: we want to hide the redundancy details form the server, and likewise
we want to hide the server details from the redundancy service. Thus
flexibility and maintainability is achieved. And the thinner is the API,
the better.

That is one of the reasons of identifying the model - this is the best way
to see what this API should contain.

To put it another way, there are APIs and there is an integration code
that holds them together. Like, for example, the code that we exchanged
with Kristian.
...
This opens up a new field of questions related to the user interface
of all this. Typically, or "how things are today", a user will
initiate replication/redundancy related events from the side of the
redundancy service. Eg if I want to setup mysql statement based
replication, there is a set of commands to do that. If I want to
recover the database by replaying the binlog file, there is a set of
binlog specific tools to do that. Each redundancy service solves some
problems from its own specific approach, and provides a user interface
for those tasks. So I guess at some point it will be interesting to
see what the command interface to all this will look like and whether
I use something specific to the redundancy service or some general
MariaDB command set to make replication happen.
It does not so much depend on where you draw the API line, but more on
what aspects of the model you want to expose to the user. Most probably -
all. Thus we'll need the ability to create a replication set, add plugins
to its stack (perhaps first create the stack) and configure individual
plugin instances. Setting variables is definitely not enough for that, so
you'll need either a special set of commands, something along the GRANT
line, or, considering that replication configuration tends to be highly
structured and you'll keep it in the tables, a special (don't laugh yet)
storage engine where you will be able to modify table contents using
regular SQL, and this engine will in turn call corresponding API calls. I
think there could be a number of benefits in such arrangement, although I'm
not sure about performance.
...
At least the application of replicated transactions certainly should
not be part of each storage engine. From the engine point of view,
applying a set of replicated transactions should be "just another
transaction". For the engine it should not matter if a transaction
comes from the application, mysqldump, or a redundancy service. (There
may be small details: when the application does a transaction, we need
a new global txn id, but when applying a replicated transaction, the
id is already there.)
Certainly. I think this goes without question. What I meant back there was
that either the engine or the server should be capable of parallel
(out-of-order is interesting only if it is parallel, right?) applying and
for the purposes of recovery it will be no longer enough for the engine to
just miantain the last committed transaction ID, it'll have to keep the
list of uncommitted transactions from the last group.

-- 
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Re: [Maria-developers] Ideas for improving MariaDB/MySQL replication

Alex Yurchenko