[MariaDB developers] Re: Client-server protocol improvement proposal: Connection redirection

29 Jul 2023

      Hi,
...
On Jul 23, Otto Kekäläinen wrote:
...
...
...
Have you considered using AUTH_SWITCH_REQUEST for that purpose?
That would allow redirect to happen after switch to TLS and
client/server certificate validation.
One of the main use cases at the moment is to redirect clients to
another server when this current server is being shut down.
For that we need to be able to send the redirect information in the
middle of the session, not only when a connection is being
established. Both session tracking and error message approach allow
that.
Not strictly so - we don't *need* to send the redirect information in
the middle of the session. The session can just be dropped, and when
clients automatically reconnect they will get the error message (if
old client software) or do an automatic redirect to the new server (if
https://github.com/MariaDB/server/pull/2681 is merged).
First, this requires the client to perform one extra connection attempt,
just to get the redirection information. This sounds like a waste of
time. And also this is not guaranteed to work - if the server shuts
down, the socket might be already closed, so the client simply won't be
able to connect again.
It is not a waste of time as the time it takes to attempt a connection
and get an error+redirect reply is minimal. The actual delay of how
much time is "wasted" in a switchover situation comes from how quickly
the server A is able to drain all connections and server B to sync up
and get promoted to primary, and that is the time the design should
focus to optimize.

It is guaranteed to work as long as the DBA keeps the server running
and redirecting - just like with HTTP 301 redirects.
...
Second, it aborts the session in the middle where the client might not
expect it. For example, in the middle of the transaction. Some clients
might be able to replay it, others might not. That's why I suggest to
let the client decide when it can reconnect.
The switchover is initiated and decided by the DBA controlling the
server. You cannot possibly expect it to be controlled by the client -
and in particular to be decided by the *slowest* client. The server
can allow some time for the draining of existing connections to
happen, but eventually it needs to tell old client connections to
stop, and only after the redirect has started for new connections +
all existing connections have been dropped can the actual switchover
proceed by promoting a new primary.
...
...
This is similar to how HTTP 301/302 redirects work, and pretty sweet
as it allows to gracefully drain the old server and for clients to
fall back on tried-and-tested semantics on what is done on connection
interrupts.
HTTP is stateless, what it does cannot always be blindly applied to the
very much stateful MariaDB client-server protocol.
And, really, I don't understand how phrases "session can just be
dropped" and "gracefully drain the old server" can be used in adjacent
statements. There's nothing graceful about forcefully dropping the
connection.
Perhaps not directly graceful, but at least all production grade
clients have some mechanisms to cope with network interruptions and
such. If you introduce a 'recommendation' based direct, it will lead
clients to implement completely novel logic which in worst case leads
to clients competing on which one is the last to use the old server to
minimize per-client downtime and making system level total downtime in
switchover worse for all other clients.

Thus introducing a redirection that is based on server denying new
connections will nicely align with existing error handling logic in
clients.