On Sat, 20 Mar 2010 13:52:47 +0200, Henrik Ingo <henrik.ingo@avoinelama.fi> wrote:
On Wed, Mar 17, 2010 at 9:01 PM, Alex Yurchenko <alexey.yurchenko@codership.com> wrote:
The problem is that you cannot really design and program by use cases, unorthodox as it may sound. You cannot throw an arbitrary bunch of use cases as input and get code as output (that is in a finite time and of finite quality). Whether you like it or not, you always program some model.
Uh, I'm not sure I can accept this proposition. At least it seems contradictory to MariaDB's vision of being a practical, user and customer driven, database.
I do understand the desire to marry marketing to software design, but they are simply unrelated areas of human activity. "Computer science" is called "science" because there are real laws which no marketing genius can invalidate. So YMMV.
As I see it, for real world applications, you should always start with use cases. But it is ok if you want to come back to me and say that a subset of use cases should be discarded because they are too difficult to service, or even contradict each other. But just saying that you'd like to implement an abstract model without connection to any use cases sounds dangerous to me.
I never suggested to implement a model without connection to use cases, and I believe I went to sufficient lengths to explain how proposed model can satisfy a broad range of use cases. What I was saying, that you're always programming a model, not use cases and therefore anything that you want to implement must be expressed in terms of the model. In this connection saying that you have a use case that does not need linearly ordered commits really means nothing. Either you need to propose another model, live with linearly ordered commits or drop the case. Either way it has no effect on the design of this model implementation, because linearly ordered commits IS the model. You cannot throw them out without breaking the rest of the concept. So much for the usefulness of use cases in high-level design: some of them fit, some of them don't.
I'm also a fan of abstract thinking though. Sometimes you can get great innovations from starting with a nice abstract model, and then ask yourself which real world problems it would (and would not) solve.
And that's exactly what I'm trying to do in this thread - start with a model, not use cases.
Either way, you end up with anchoring yourself in real world use cases.
Well, when you start with a model, it means that you use it as a reference stick to accept or reject use cases, doesn't it? So that makes the model an anchor. And leaves use cases only as means to see how practical the model is. And there is another curious property to models: the more abstract is the model (i.e. the less it is rooted in use cases), the more use cases it can satisfy. Once you stop designing specifically for asynchronous replication, you find out that the same scheme works for synchronous too.
So now we have a proposed model based on Redundancy Sets, linearly ordered global transaction IDs and ordered commits. We pretty much understand how it will work, what sort of redundancy it will provide and, as you agreed, is easy to use for recovery and node joining. It satisfies a whole bunch of use cases, even those where ordering of commits is not strictly required. Perhaps we won't be able to have some optimizations where we could have had them without ordering of commits, but the benefit of such optimizations is highly questionable IMO. MySQL/Galera is a practical implementation of such model, may be not exactly what we want to achieve here, but it gives a good estimate of performance and performance is good.
Back on track: So the API should of course implement something which has as broad applicability as possible. This is the whole point of questioning you, since now you have just suggested a model which happens to nicely satisfy Galera's needs :-)
Well, this may seem like it because Galera is the only explicit implementation of that model. But the truth is Galera is possible only because this model was explicitly followed. And this model didn't come out of thin air. It is a result of years of research and experience - not only ours. For example, MySQL|MariaDB is already implementing large portion of the proposed model by representing evolution of a database as a _series_ of atomic changes recorded in a binlog. In fact it had global transaction IDs from day one. They are just expressed in the way that makes sense only in the context of a given file on a given server. Had they been recognized as global transaction IDs, implementing a mapping from a file offset to an ordinal number is below trivial. Then we would not be having 3rd party patches applicable only to MySQL 5.0. (Let's face it, global transaction IDs in master-slave replication are so trivial they are practically built in.) The reason why there is no nice replication API in MariaDB yet is that this model was never explicitly recognized. And API is a description of a model. You cannot describe what you don't recognize ;) So in reality I am not proposing anything new or specific to Galera. I'm just suggesting to recognize what you already have there (and proposing the abstractions to express it). <cut>
So those are the requirements I could derive from having NDB use our to-be-implemented API. My conclusion from the above is that we should consider adding to the model the concept of a transaction group, which: -> the engine (or MariaDB server, for multi-engine transactions?) MAY provide information of which transactions had been committed within the same group. -> If such information was provided, a redundancy service MAY process transactions inside a group in parallel or out of order, but MUST make sure that all transactions in transaction group G1 are processed/committed before the first transaction in G2 is processed/comitted.
Well, that's a pretty cool concept. One way to call it is "controlled eventual consistency". But does redundancy service have to know about it? First of all, these groups are just superpositions of individual atomic transactions. That is, this CAN be implemented on top of the current model. Secondly, transaction applying is done by the engine, so the engine or the server HAS to have a support for this, both on the master and on the slave side. So why not keep the redundancy service API free from that at all? Consider this scheme: Database Server | Redundancy Service (database data) | (redundancy information) | Redundancy API The task of redundancy service is to store and provide redundancy information that can be used in restoring the database to a desired state. Keeping the information and using it - two different things. The purpose of API is to separate one part of the program from the logic of another. So I'd keep the model and the API as simple as free from the server details as possible. What it means here: redundancy service stores atomic database changes in a certain order and it guarantees that it will return these changes in the same order. This is sufficient to restore the database to any state it had. It is up to the server in what order it will apply these changes and if it wants to skip some states. (This assumes that the changesets are opaque to redundancy service and the server can include whatever information it wants in them, including ordering prefixes)
We should not include the NDB internal replication in this discussion.
It was taken solely as an example of a real world use case where you may not have linearly ordered commits. Regards, Alex -- Alexey Yurchenko, Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011