Recently we observed that when network and IO latency increased significantly (and correspondingly, that bandwidth reduced significantly) all nodes in a 3-node mariadb galera cluster reported the other nodes as partitioned.

We understand that by default galera is configured to wait 6 seconds before declaring other nodes partitioned. Is there any additional documentation on the performance on mariadb/galera with respect to network/IO latency and bandwidth? Are there hard limits beyond which a cluster is known to fail, and soft limits which are known to degrade performance?

Thanks,

Rob and Lyle
Cloud Foundry Core Services