[Maria-discuss] MaxScale behaviour differs for TLs and non-TLS clients
Hi, I'm (still) testing MaxScale, set up to connect only to the master node, on Ubuntu 16.04 fronting a Galera cluster. The cluster comprises of 2 MariaDB 10.1 instances on different servers, with a Galera Arbitrator instance running on the MaxScale server. I'm testing this using the MariaDB client (mysql) from a fourth machine. My test scenario is to see what the client experiences if I stop and restart a MariaDB node part-way through a transaction. I start with the "slave" node, to give me a baseline for comparison, before doing it with the master node. However, the baseline case gives me different results, depending on whether the MySQL client is connecting via TLS or not. If I connect to MaxScale via TLS, the connection disappears when the "slave" node comes back (but not when it goes down). Watching it via a network packet trace, I can see that, just after the slave comes back up, MaxScale sends the client an encrypted packet to the mysql client, then sends a MySQL Quit command to the master node before disconnecting. The syslog contains the line "[galeramon] There are no cluster members". If I connect to MaxScale without TLS, the connection remains stable regardless of the number of times the slave node goes down and up, and the "galeramon" line doesn't appear in the syslog. (I discovered this when I disabled the TLS in order to see what the encrypted packet being sent to the client was...I still don't know what it is!) Has anyone else come across this behaviour? MaxScale is configured as follows (the commented-out configuration is uncommented when connecting via TLS): [dbnode1] type=server address=172.16.1.22 port=3306 protocol=MySQLBackend priority=1 [dbnode2] type=server address=172.16.1.23 port=3306 protocol=MySQLBackend priority=2 [Galera Monitor] type=monitor module=galeramon servers=dbnode1,dbnode2 user=galeramon passwd=galeramon monitor_interval=1000 available_when_donor=true use_priority=true [Galera Service] type=service router=readrouteconn router_options=master servers=dbnode1,dbnode2 user=galeramon passwd=galeramon [MaxAdmin Service] type=service router=cli [Galera Listener] type=listener service=Galera Service protocol=MySQLClient port=3306 #ssl=required #ssl_version=TLSv12 #ssl_cert=/etc/mysql/ssl/server-cert.pem #ssl_key=/etc/mysql/ssl/server-key.pem #ssl_ca_cert=/etc/mysql/ssl/ca-cert.pem #ssl_cert_verify_depth=1 [MaxAdmin Listener] type=listener service=MaxAdmin Service protocol=maxscaled socket=default PC
Hi, Thank you for testing MaxScale and reporting your findings. We haven't seen the behavior you described and it is certainly not expected. One possibility I can think of is that for some reason, a TLS connection is created between the monitor and the Galera when a normal one should be used. Unless the servers are configured to use TLS, the monitor should create normal, plain connections. I think you can decrypt the packets with Wireshark as long as you have the private key and the certificate. This might be a helpful link in determining what the encrypted packet was: https://wiki.wireshark.org/SSL As this is not expected behavior, I would like to ask you to file a bug report on the MariaDB Jira under the MaxScale project. Hopefully we'll get to the bottom of this soon. Markus On 12/10/17 19:39, Pak Chan wrote:
Hi,
I'm (still) testing MaxScale, set up to connect only to the master node, on Ubuntu 16.04 fronting a Galera cluster. The cluster comprises of 2 MariaDB 10.1 instances on different servers, with a Galera Arbitrator instance running on the MaxScale server. I'm testing this using the MariaDB client (mysql) from a fourth machine.
My test scenario is to see what the client experiences if I stop and restart a MariaDB node part-way through a transaction. I start with the "slave" node, to give me a baseline for comparison, before doing it with the master node. However, the baseline case gives me different results, depending on whether the MySQL client is connecting via TLS or not.
If I connect to MaxScale via TLS, the connection disappears when the "slave" node comes back (but not when it goes down). Watching it via a network packet trace, I can see that, just after the slave comes back up, MaxScale sends the client an encrypted packet to the mysql client, then sends a MySQL Quit command to the master node before disconnecting. The syslog contains the line "[galeramon] There are no cluster members".
If I connect to MaxScale without TLS, the connection remains stable regardless of the number of times the slave node goes down and up, and the "galeramon" line doesn't appear in the syslog. (I discovered this when I disabled the TLS in order to see what the encrypted packet being sent to the client was...I still don't know what it is!)
Has anyone else come across this behaviour?
MaxScale is configured as follows (the commented-out configuration is uncommented when connecting via TLS):
[dbnode1] type=server address=172.16.1.22 port=3306 protocol=MySQLBackend
priority=1
[dbnode2] type=server address=172.16.1.23 port=3306 protocol=MySQLBackend
priority=2
[Galera Monitor] type=monitor module=galeramon servers=dbnode1,dbnode2 user=galeramon passwd=galeramon monitor_interval=1000 available_when_donor=true use_priority=true
[Galera Service] type=service router=readrouteconn
router_options=master
servers=dbnode1,dbnode2 user=galeramon passwd=galeramon
[MaxAdmin Service] type=service router=cli
[Galera Listener] type=listener service=Galera Service protocol=MySQLClient port=3306 #ssl=required #ssl_version=TLSv12 #ssl_cert=/etc/mysql/ssl/server-cert.pem #ssl_key=/etc/mysql/ssl/server-key.pem #ssl_ca_cert=/etc/mysql/ssl/ca-cert.pem #ssl_cert_verify_depth=1 [MaxAdmin Listener] type=listener service=MaxAdmin Service protocol=maxscaled socket=default
PC
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-- Markus Mäkelä, Software Engineer MariaDB Corporation t: +358 40 7740484 | Skype: markus.j.makela
Thanks for the response. I won't be able to get back to it until Monday, so I'll see about decrypting the packet then and report the bug with the extra information. On 12 Oct 2017 22:49, "Markus Mäkelä" <markus.makela@mariadb.com> wrote:
Hi,
Thank you for testing MaxScale and reporting your findings. We haven't seen the behavior you described and it is certainly not expected. One possibility I can think of is that for some reason, a TLS connection is created between the monitor and the Galera when a normal one should be used. Unless the servers are configured to use TLS, the monitor should create normal, plain connections.
I think you can decrypt the packets with Wireshark as long as you have the private key and the certificate. This might be a helpful link in determining what the encrypted packet was: https://wiki.wireshark.org/SSL
As this is not expected behavior, I would like to ask you to file a bug report on the MariaDB Jira under the MaxScale project. Hopefully we'll get to the bottom of this soon.
Markus
On 12/10/17 19:39, Pak Chan wrote:
Hi,
I'm (still) testing MaxScale, set up to connect only to the master node, on Ubuntu 16.04 fronting a Galera cluster. The cluster comprises of 2 MariaDB 10.1 instances on different servers, with a Galera Arbitrator instance running on the MaxScale server. I'm testing this using the MariaDB client (mysql) from a fourth machine.
My test scenario is to see what the client experiences if I stop and restart a MariaDB node part-way through a transaction. I start with the "slave" node, to give me a baseline for comparison, before doing it with the master node. However, the baseline case gives me different results, depending on whether the MySQL client is connecting via TLS or not.
If I connect to MaxScale via TLS, the connection disappears when the "slave" node comes back (but not when it goes down). Watching it via a network packet trace, I can see that, just after the slave comes back up, MaxScale sends the client an encrypted packet to the mysql client, then sends a MySQL Quit command to the master node before disconnecting. The syslog contains the line "[galeramon] There are no cluster members".
If I connect to MaxScale without TLS, the connection remains stable regardless of the number of times the slave node goes down and up, and the "galeramon" line doesn't appear in the syslog. (I discovered this when I disabled the TLS in order to see what the encrypted packet being sent to the client was...I still don't know what it is!)
Has anyone else come across this behaviour?
MaxScale is configured as follows (the commented-out configuration is uncommented when connecting via TLS):
[dbnode1] type=server address=172.16.1.22 port=3306 protocol=MySQLBackend
priority=1
[dbnode2] type=server address=172.16.1.23 port=3306 protocol=MySQLBackend
priority=2
[Galera Monitor] type=monitor module=galeramon servers=dbnode1,dbnode2 user=galeramon passwd=galeramon monitor_interval=1000 available_when_donor=true use_priority=true
[Galera Service] type=service router=readrouteconn
router_options=master
servers=dbnode1,dbnode2 user=galeramon passwd=galeramon
[MaxAdmin Service] type=service router=cli
[Galera Listener] type=listener service=Galera Service protocol=MySQLClient port=3306 #ssl=required #ssl_version=TLSv12 #ssl_cert=/etc/mysql/ssl/server-cert.pem #ssl_key=/etc/mysql/ssl/server-key.pem #ssl_ca_cert=/etc/mysql/ssl/ca-cert.pem #ssl_cert_verify_depth=1
[MaxAdmin Listener] type=listener service=MaxAdmin Service protocol=maxscaled socket=default
PC
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-- Markus Mäkelä, Software Engineer MariaDB Corporation t: +358 40 7740484 <+358%2040%207740484> | Skype: markus.j.makela
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi Markus, I can confirm that Galeramon is connecting to the Galera database instances via unencrypted connections. In fact, Galeramon never loses connection to the master node, constantly pinging it and querying it for "wsrep_local_state" and "wsrep_local_index" status. Having decrypted the packet, the last packet emitted by MaxScale to the client prior to closing the connection to both the client and the master node is a MySQL 2003 response. However, I have replicated it for a non-TLS client connection. I have tracked it down to a specific circumstance: it occurs when the master node has a HIGHER 'wsrep_local_index' value than the slave when the slave is brought back online. It does NOT occur when the master node has a LOWER 'wsrep_local_index' value than the slave. In that circumstance MaxScale/Galeramon reports (in /etc/syslog): maxscale[10675]: [galeramon] There are no cluster members maxscale[10675]: Server changed state: dbnode1[172.100.1.22:3306]: lost_master. [Master, Synced, Running] -> [Running] maxscale[10675]: [galeramon] Found cluster members maxscale[10675]: Server changed state: dbnode1[172.100.1.22:3306]: new_master. [Running] -> [Master, Synced, Running] maxscale[10675]: Server changed state: dbnode2[172.100.1.23:3306]: slave_up. [Down] -> [Slave, Synced, Running] Is that a known issue, or should I report it as a new issue? On 12 Oct 2017 11:08 p.m., "Pak Chan" <pakchan9000@gmail.com> wrote:
Thanks for the response. I won't be able to get back to it until Monday, so I'll see about decrypting the packet then and report the bug with the extra information.
On 12 Oct 2017 22:49, "Markus Mäkelä" <markus.makela@mariadb.com> wrote:
Hi,
Thank you for testing MaxScale and reporting your findings. We haven't seen the behavior you described and it is certainly not expected. One possibility I can think of is that for some reason, a TLS connection is created between the monitor and the Galera when a normal one should be used. Unless the servers are configured to use TLS, the monitor should create normal, plain connections.
I think you can decrypt the packets with Wireshark as long as you have the private key and the certificate. This might be a helpful link in determining what the encrypted packet was: https://wiki.wireshark.org/SSL
As this is not expected behavior, I would like to ask you to file a bug report on the MariaDB Jira under the MaxScale project. Hopefully we'll get to the bottom of this soon.
Markus
On 12/10/17 19:39, Pak Chan wrote:
Hi,
I'm (still) testing MaxScale, set up to connect only to the master node, on Ubuntu 16.04 fronting a Galera cluster. The cluster comprises of 2 MariaDB 10.1 instances on different servers, with a Galera Arbitrator instance running on the MaxScale server. I'm testing this using the MariaDB client (mysql) from a fourth machine.
My test scenario is to see what the client experiences if I stop and restart a MariaDB node part-way through a transaction. I start with the "slave" node, to give me a baseline for comparison, before doing it with the master node. However, the baseline case gives me different results, depending on whether the MySQL client is connecting via TLS or not.
If I connect to MaxScale via TLS, the connection disappears when the "slave" node comes back (but not when it goes down). Watching it via a network packet trace, I can see that, just after the slave comes back up, MaxScale sends the client an encrypted packet to the mysql client, then sends a MySQL Quit command to the master node before disconnecting. The syslog contains the line "[galeramon] There are no cluster members".
If I connect to MaxScale without TLS, the connection remains stable regardless of the number of times the slave node goes down and up, and the "galeramon" line doesn't appear in the syslog. (I discovered this when I disabled the TLS in order to see what the encrypted packet being sent to the client was...I still don't know what it is!)
Has anyone else come across this behaviour?
MaxScale is configured as follows (the commented-out configuration is uncommented when connecting via TLS):
[dbnode1] type=server address=172.16.1.22 port=3306 protocol=MySQLBackend
priority=1
[dbnode2] type=server address=172.16.1.23 port=3306 protocol=MySQLBackend
priority=2
[Galera Monitor] type=monitor module=galeramon servers=dbnode1,dbnode2 user=galeramon passwd=galeramon monitor_interval=1000 available_when_donor=true use_priority=true
[Galera Service] type=service router=readrouteconn
router_options=master
servers=dbnode1,dbnode2 user=galeramon passwd=galeramon
[MaxAdmin Service] type=service router=cli
[Galera Listener] type=listener service=Galera Service protocol=MySQLClient port=3306 #ssl=required #ssl_version=TLSv12 #ssl_cert=/etc/mysql/ssl/server-cert.pem #ssl_key=/etc/mysql/ssl/server-key.pem #ssl_ca_cert=/etc/mysql/ssl/ca-cert.pem #ssl_cert_verify_depth=1
[MaxAdmin Listener] type=listener service=MaxAdmin Service protocol=maxscaled socket=default
PC
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-- Markus Mäkelä, Software Engineer MariaDB Corporation t: +358 40 7740484 <+358%2040%207740484> | Skype: markus.j.makela
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
participants (2)
-
Markus Mäkelä
-
Pak Chan