Sylvain, I don't know much what buildroot does, so I don't know if you're hitting any limitation that buildroot might have. Just a suggestion, you can try adding "set -x" to /usr/bin/wsrep_sst_rsync, so the script will dump its output in the log. You should be able to know where it hangs precisely then. You can also try to run the SST command manually on the nodes, and see what it does. You can get the full command output in ps so you're free to start a donor on one node and a joiner on another node and follow the script output. Best, 2015-06-25 12:27 GMT+02:00 Sylvain Raybaud < sylvain.raybaud@green-communications.fr>:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi
I'll try hard to fix the problem with rsync, even if I switch to mysqldump or xtrabackup-v2 in the future.
Some context : mariadb and galera are built in buildroot and run on raspberry pis (ARMv6). Mariadb is started manually. I fixed the config file as you suggested yesterday (wsrep_cluster_address=gcomm://IP1,IP2,IP3 in my.cnf on all three nodes). I bootstrap the cluster on node #1 with --wsrep-new-cluster and I just start mysqld_safe and nodes #2 and #3.
Running: mysql -u root --execute="SHOW GLOBAL STATUS WHERE Variable_name IN ('wsrep_ready', 'wsrep_cluster_size', 'wsrep_cluster_status', 'wsrep_connected');"
on node #1 gives me: +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | wsrep_cluster_size | 3 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_ready | ON | +----------------------+---------+
and on nodes #2 and #3: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (111 "Connection refused")
If I start mariadb outside a cluster on nodes #2 and #3 it works as expected.
It seems nodes #2 and #3 are never synced since SYNCED never appears in error log.
wsrep_sst_method is rsync.
On 24/06/2015 15:00, Guillaume Lefranc wrote:
Does this mean that mariadb tries to sync nodes but for some reason the script hangs, the nodes are never sync and remain unusable?
I'm afraid that must be something along those lines. I have the following suspicions:
* SELinux or Apparmor not disabled, causing the SST to hang forever. * Ports not open on the firewall (for the record you need 3306, but also 4444, 4567 and 4568).
Neither SELinux or Apparmor is enabled and there is no firewall on any of the boxes. They are all connected to the same ethernet switch.
I'm investigating wsrep_sst_rsync script. Here are the processes running:
# ps auxf | grep mysql 128 root {mysqld_safe} /bin/sh /usr/bin/mysqld_safe - --defaults-file=/etc/mysql/my.cnf 422 mysql /usr/bin/mysqld --defaults-file=/etc/mysql/my.cnf - --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/plugin - --user=mysql --wsrep_provider=/usr/lib/libgalera_smm.so - --log-error=/var/lib/mysql/buildroot.err --pid-file=/tmp/mysql.pid - --socket=/tmp/mysql.sock --port=3306 - --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1 429 mysql sh -c wsrep_sst_rsync --role 'joiner' --address '192.168.1.80' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '422' --binlog '/var/lib/mysql/mariadb-bin' 430 mysql {wsrep_sst_rsync} /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 192.168.1.80 --auth --datadir /var/lib/mysql/ --defaults-file /etc/mysql/my.cnf --parent 422 --binlog /var/lib/mysql/mariadb-bin 458 mysql rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf 26966 mysql sleep 0.2
Note the sleep 0.2: it shows up often. It appears in wsrep_sst_rsync in two places:
lines 136--140: # wait for tables flushed and state ID written to the file
while [ ! -r "$FLUSHED" ] && ! grep -q ':' "$FLUSHED"
/dev/null 2>&1 do sleep 0.2 done
lines 281--284: until check_pid_and_port $RSYNC_PID $RSYNC_REAL_PID $RSYNC_PORT do sleep 0.2 done
Is it possible that one these loops never terminates?
If I run the same commands invoked by mysql manually, they terminate with errors because SST_PROGRESS_FILE and WSREP_SST_OPT_ROLE aren't defined but this is probably not significant.
I'll be happy to read any idea you might have. Maybe I should also send this message to galera mailing list?
Cheers,
- -- Sylvain Raybaud www.green-communications.fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQIcBAEBAgAGBQJVi9d9AAoJEEkkwl4JtJ9yE6EP/3PVzEXO7KVX9UrKrQSLXXM/ EWuV2a2KKxB8GkH1uDBISrn84futPR6M7/gdbMV8DeRNWAbCWgXWkrY/HVALvLxu jze4znZMmn+mxqDzmhp1klon+w9WAyH7lSfIC4AGgSiAe6ZFP5c8CSfbLNJPlSM+ buoJTQz9BFo+NTh4w7gbZUjAJVEm/7CpM8TwMPsu7+mMvbn4yMMLU2RkjhysGGKI YAJ9dXYjmD+49Z3Z2B54lRGeIbZNMDRjlQ/+F61Ml9XbQYJtDb8gHGN5a4FIiaFh pvOtjCi5foXdGfGv/nPbSkrMqkKBJHL0xbnKeze/OdIdZ1aOVN7Sy5tZJirlCJV3 q/48hmrIIMVU1FuC9F0zW7lGHsgsKiQvLM0X8C/ZdX0ZTvTjWWefoWLa/fSv/vYp z9E5HJxJ47gbnGeSEBby1+TmV/GoXvjiyH3oRXzEuyEP5xQtKH4Kvrh9MhwZdZpv sJc5QO9qfAHPecavhRQ1luuv6JCSqLPGpF+3RB80rB+BRUkwi0dfO0RmBzzmKBGZ vryGxAVBNZmpIatuCWY4cvvrkQm9kF7JMAR1rNTRylxLtyI5TR42c5r24tSasWct P6gICSrMBKVgoCdxanTI2q+faqAur/NDq4W6tgeqRnV/mk9zQW4oxJxfwQlS0zr6 YiUMXGeRi6IT0MTKwChk =goS7 -----END PGP SIGNATURE-----
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp