Hi Sergei,
On 2024/08/02 11:30, Sergei Golubchik via discuss wrote:Hi, Marc, On Aug 02, Marc wrote:If the server will delay enforcing of max_connections (that is, the server will not reject connections about max_connections at once), then this user in the above scenario will open all possible connections your OS can handle and the computer will become completely inaccessible.The idea about this change is to have a more useful and expected implementation of max_user_connections and max_connections. Currently I am using max_connections not for what it is supposed to be used, just because the max_user_connections is not doing as much as it 'should'.Hi Sergei, Is this something you are going to look in to? I am also curious about this delay between first package and package with the username. I can't imagine that being such a problem, to me this looks feasible currently.I'm afraid, I don't understand your use case. There are, basically, three limits now: max_user_connections, max_connections, OS limit. An ordinary user would connect many times, hit max_user_connections and stop. Or will keep connecting and get disconnects because of max_user_connections. A malicious user would connect and wouldn't authenticate, this will exhaust max_connections and nobody will be able to connect to the server anymore. max_user_connections won't help here.
I think let me explain a different way, and doesn't directly to
what I understand Marc's use-case to be, but relates, and what I
reckon is not a bad compromise, because I get where Marc is coming
from. We've had some interesting experiences with a remote party
effectively DOS'ing themselves from connecting to one of our
haproxy instances as well recently (https), so not directly
related.
TCP connection establishment is phase one. This is limited by
operating system receive queues (yes, two of them, one for
SYN_RECV, and one for ESTABLISHED but not yet accept()ed), but in
the case of Linux the SYN_RECV queue can be exceeded if configured
to use syn cookies. Bad idea? Possibly as it prevents tcp
options from being used, but it does allow a connection to be
established at all in case of SYN flood, so IMHO, switching to SYN
cookies once SYN_RECV queue is full is a good idea since a
degraded but working connection is significantly better than no
connection at all. This isn't mariadb specific, nor does it
relate to Marc's request but does give some level of background.
It's the same issue under-lyingly, just at a different layer.
Once MariaDB accept()s the connection I understand MariaDB counts
it against max_connections. If max_connections is then exceeded
the new connection is dropped. This can trivially deny service to
legitimate well-behaved clients.
This provides for a very, very simple DOS situation. Simply open
a connection from a remote side, and never send anything.
Eventually MariaDB will close this connection, not sure how long
this would take, dropping the connection count again, and only now
legitimate users can connect again. As I understand, this is what
Marc is experiencing. There are many reasons why this could
happen under *normal operations* but you're right, this is a
"badly behaving client". You're also right that not limiting this
pre-auth would just move the problem to operating system limits.
Our use-cases are controlled in most cases, we have one case where
unfortunately MariaDB needs to be world-exposed, and we've got no
way around that, and this would apply to us here as well.
Fortunately we have other mechanisms in place to rate limit how
fast untrusted sources can connect, which helps to mitigate this.
I think one could also front this with a tool like haproxy which
can be configured to say if the client side doesn't send something
within the first X ms of a connection, close the connection, which
could be protection layer two.
That said, I agree with Marc that the situation can be improved
MariaDB side. He's worried about a mix of good and bad actors
from the same IP address, our use-case is from different IPs, but
the same underlying problem. MariaDB can (in my opinion) help in
both cases.
I would suggest have a separate max_unauthenticated_connections
counter, and an authenticate_timeout variable (no more than 2s for
most use-cases, and I can't imagine this should be higher than 5s
in any situation).
I would probably run with something like:
max_connections = 5000
max_user_connections = 250
max_unauthenticated_connections = 500
Combining this with firewall rate-limits from untrusted sources
one can then get a fairly protected setup, so we normally do burst
100, max 1/s connections by default, with a 5s timeout on mariadb
side this permits a "bad player" maximum 100 connections
initially, but over time max 5 connections to DOS. Per source
IP. So even with my suggestion one can run into trouble, but at
least it makes it harder. One could adjust relevant rate limits
to something like 1/min with a burst of 500, or have small + large
buckets and connections over time has to pass through both, but
this gets complicated, and if someone wants to DOS you that
desperately, there honestly isn't much you're going to do, but see
below.
Once max_unauthenticated_connections is reached, I can think of
two possible strategies:
1. Drop the new connection.
2. Drop the connection we've been waiting for the longest to
auth.
Each with pro's and cons. Possibly a third option would be to
drop the connection we've been waiting for longest from same
source, else revert to 1 or 2.
In this scenario, I don't think it matters significantly if
unauthenticated connections counts towards max_connections or not,
but my gut would go towards not.
To further mitigate the multiple sources it would be great if we
can get logs specifically for authentication results, ie, for each
incoming connection log exactly one line indicating the source IP,
and the auth result, eg:
Connection auth result: user@a.b.c.d accepted.
Connection auth result: a.b.c.d timed out.
Connection auth result: user@a.b.c.d auth failed.
Of course a.b.c.d could also be IPv6 dead::beef, as the case may
be.
One can then feed this into fail2ban or similar to mitigate
further. There might be a way to log this already that I just
haven't found yet, I've only spent a very superficial amount of
time looking for this. Given this, we can adjust rate limits to
higher limits once a successfully authenticated connection
happens, say what our current defaults are, and run with even
lower defaults than current (say burst 10, 1/min or something).
Too many auth failed or timeouts in the absence of successful auth
can be used to outright ban source IPs for some time.
Bingo. You're spot on. But the current mechanism does allow for a very effective and trivial denial of service on any remote server.After your suggestion of delayed max_connections check - an ordinary user would still connect max_user_connections times, nohing would change for him. A malicious user, not stopped by max_connections anymore, would completely exhaust OS capability for opening new connections making the whole OS inaccessible.
That's what I mean - I don't understand your use case. It doesn't change much if all users behave and it makes the situation much worse if a user is malicious. So, in what use case your change would be an improvement?
I hope the above helped.
Kind regards,
Jaco