Greetings all,
I am running MariaDB 10.2.16 on CentOS in AWS and am seeing a
sporadic cluster partitioning and rejoining issue with seemingly
no explicable cause.
- I have elements in 3 different AWS availability zones in a
single galera cluster
- Monitoring logs I see this message: Jul 29
05:33:53 server01 mysqld: 2018-07-29 5:33:53
139633883080448 [Note] WSREP: (eabb848a,
'tcp://0.0.0.0:4567') connection to peer 392b9516 with addr
tcp://172.31.17.60:4567 timed out, no messages seen in PT3S
- I have tried forcing a 1500byte MTU as some others sources
mentioned jumbo framing could negatively impact galera
replication.
- Running prolonged packet captures between nodes i cannot seem
to find anything else wrong, network connectivity isn't
interrupted and no service restarts occur.
- These partition events happen multiple times per day.
Has anyone seem this sporadic cluster disconnect and re-join issue
in a similar env? I did not previously note this behavior on 10.1.
Any help is much appreciated.
-Ryan