Re: [Maria-discuss] Can't connect to local MySQL server through socket
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi I'll try hard to fix the problem with rsync, even if I switch to mysqldump or xtrabackup-v2 in the future. Some context : mariadb and galera are built in buildroot and run on raspberry pis (ARMv6). Mariadb is started manually. I fixed the config file as you suggested yesterday (wsrep_cluster_address=gcomm://IP1,IP2,IP3 in my.cnf on all three nodes). I bootstrap the cluster on node #1 with --wsrep-new-cluster and I just start mysqld_safe and nodes #2 and #3. Running: mysql -u root --execute="SHOW GLOBAL STATUS WHERE Variable_name IN ('wsrep_ready', 'wsrep_cluster_size', 'wsrep_cluster_status', 'wsrep_connected');" on node #1 gives me: +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | wsrep_cluster_size | 3 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_ready | ON | +----------------------+---------+ and on nodes #2 and #3: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (111 "Connection refused") If I start mariadb outside a cluster on nodes #2 and #3 it works as expected. It seems nodes #2 and #3 are never synced since SYNCED never appears in error log. wsrep_sst_method is rsync. On 24/06/2015 15:00, Guillaume Lefranc wrote:
Does this mean that mariadb tries to sync nodes but for some reason the script hangs, the nodes are never sync and remain unusable?
I'm afraid that must be something along those lines. I have the following suspicions:
* SELinux or Apparmor not disabled, causing the SST to hang forever. * Ports not open on the firewall (for the record you need 3306, but also 4444, 4567 and 4568).
Neither SELinux or Apparmor is enabled and there is no firewall on any of the boxes. They are all connected to the same ethernet switch. I'm investigating wsrep_sst_rsync script. Here are the processes running: # ps auxf | grep mysql 128 root {mysqld_safe} /bin/sh /usr/bin/mysqld_safe - --defaults-file=/etc/mysql/my.cnf 422 mysql /usr/bin/mysqld --defaults-file=/etc/mysql/my.cnf - --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/plugin - --user=mysql --wsrep_provider=/usr/lib/libgalera_smm.so - --log-error=/var/lib/mysql/buildroot.err --pid-file=/tmp/mysql.pid - --socket=/tmp/mysql.sock --port=3306 - --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1 429 mysql sh -c wsrep_sst_rsync --role 'joiner' --address '192.168.1.80' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '422' --binlog '/var/lib/mysql/mariadb-bin' 430 mysql {wsrep_sst_rsync} /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 192.168.1.80 --auth --datadir /var/lib/mysql/ --defaults-file /etc/mysql/my.cnf --parent 422 --binlog /var/lib/mysql/mariadb-bin 458 mysql rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf 26966 mysql sleep 0.2 Note the sleep 0.2: it shows up often. It appears in wsrep_sst_rsync in two places: lines 136--140: # wait for tables flushed and state ID written to the file while [ ! -r "$FLUSHED" ] && ! grep -q ':' "$FLUSHED"
/dev/null 2>&1 do sleep 0.2 done
lines 281--284: until check_pid_and_port $RSYNC_PID $RSYNC_REAL_PID $RSYNC_PORT do sleep 0.2 done Is it possible that one these loops never terminates? If I run the same commands invoked by mysql manually, they terminate with errors because SST_PROGRESS_FILE and WSREP_SST_OPT_ROLE aren't defined but this is probably not significant. I'll be happy to read any idea you might have. Maybe I should also send this message to galera mailing list? Cheers, - -- Sylvain Raybaud www.green-communications.fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVi9d9AAoJEEkkwl4JtJ9yE6EP/3PVzEXO7KVX9UrKrQSLXXM/ EWuV2a2KKxB8GkH1uDBISrn84futPR6M7/gdbMV8DeRNWAbCWgXWkrY/HVALvLxu jze4znZMmn+mxqDzmhp1klon+w9WAyH7lSfIC4AGgSiAe6ZFP5c8CSfbLNJPlSM+ buoJTQz9BFo+NTh4w7gbZUjAJVEm/7CpM8TwMPsu7+mMvbn4yMMLU2RkjhysGGKI YAJ9dXYjmD+49Z3Z2B54lRGeIbZNMDRjlQ/+F61Ml9XbQYJtDb8gHGN5a4FIiaFh pvOtjCi5foXdGfGv/nPbSkrMqkKBJHL0xbnKeze/OdIdZ1aOVN7Sy5tZJirlCJV3 q/48hmrIIMVU1FuC9F0zW7lGHsgsKiQvLM0X8C/ZdX0ZTvTjWWefoWLa/fSv/vYp z9E5HJxJ47gbnGeSEBby1+TmV/GoXvjiyH3oRXzEuyEP5xQtKH4Kvrh9MhwZdZpv sJc5QO9qfAHPecavhRQ1luuv6JCSqLPGpF+3RB80rB+BRUkwi0dfO0RmBzzmKBGZ vryGxAVBNZmpIatuCWY4cvvrkQm9kF7JMAR1rNTRylxLtyI5TR42c5r24tSasWct P6gICSrMBKVgoCdxanTI2q+faqAur/NDq4W6tgeqRnV/mk9zQW4oxJxfwQlS0zr6 YiUMXGeRi6IT0MTKwChk =goS7 -----END PGP SIGNATURE-----
Sylvain, I don't know much what buildroot does, so I don't know if you're hitting any limitation that buildroot might have. Just a suggestion, you can try adding "set -x" to /usr/bin/wsrep_sst_rsync, so the script will dump its output in the log. You should be able to know where it hangs precisely then. You can also try to run the SST command manually on the nodes, and see what it does. You can get the full command output in ps so you're free to start a donor on one node and a joiner on another node and follow the script output. Best, 2015-06-25 12:27 GMT+02:00 Sylvain Raybaud < sylvain.raybaud@green-communications.fr>:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi
I'll try hard to fix the problem with rsync, even if I switch to mysqldump or xtrabackup-v2 in the future.
Some context : mariadb and galera are built in buildroot and run on raspberry pis (ARMv6). Mariadb is started manually. I fixed the config file as you suggested yesterday (wsrep_cluster_address=gcomm://IP1,IP2,IP3 in my.cnf on all three nodes). I bootstrap the cluster on node #1 with --wsrep-new-cluster and I just start mysqld_safe and nodes #2 and #3.
Running: mysql -u root --execute="SHOW GLOBAL STATUS WHERE Variable_name IN ('wsrep_ready', 'wsrep_cluster_size', 'wsrep_cluster_status', 'wsrep_connected');"
on node #1 gives me: +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | wsrep_cluster_size | 3 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_ready | ON | +----------------------+---------+
and on nodes #2 and #3: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (111 "Connection refused")
If I start mariadb outside a cluster on nodes #2 and #3 it works as expected.
It seems nodes #2 and #3 are never synced since SYNCED never appears in error log.
wsrep_sst_method is rsync.
On 24/06/2015 15:00, Guillaume Lefranc wrote:
Does this mean that mariadb tries to sync nodes but for some reason the script hangs, the nodes are never sync and remain unusable?
I'm afraid that must be something along those lines. I have the following suspicions:
* SELinux or Apparmor not disabled, causing the SST to hang forever. * Ports not open on the firewall (for the record you need 3306, but also 4444, 4567 and 4568).
Neither SELinux or Apparmor is enabled and there is no firewall on any of the boxes. They are all connected to the same ethernet switch.
I'm investigating wsrep_sst_rsync script. Here are the processes running:
# ps auxf | grep mysql 128 root {mysqld_safe} /bin/sh /usr/bin/mysqld_safe - --defaults-file=/etc/mysql/my.cnf 422 mysql /usr/bin/mysqld --defaults-file=/etc/mysql/my.cnf - --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/plugin - --user=mysql --wsrep_provider=/usr/lib/libgalera_smm.so - --log-error=/var/lib/mysql/buildroot.err --pid-file=/tmp/mysql.pid - --socket=/tmp/mysql.sock --port=3306 - --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1 429 mysql sh -c wsrep_sst_rsync --role 'joiner' --address '192.168.1.80' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '422' --binlog '/var/lib/mysql/mariadb-bin' 430 mysql {wsrep_sst_rsync} /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address 192.168.1.80 --auth --datadir /var/lib/mysql/ --defaults-file /etc/mysql/my.cnf --parent 422 --binlog /var/lib/mysql/mariadb-bin 458 mysql rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf 26966 mysql sleep 0.2
Note the sleep 0.2: it shows up often. It appears in wsrep_sst_rsync in two places:
lines 136--140: # wait for tables flushed and state ID written to the file
while [ ! -r "$FLUSHED" ] && ! grep -q ':' "$FLUSHED"
/dev/null 2>&1 do sleep 0.2 done
lines 281--284: until check_pid_and_port $RSYNC_PID $RSYNC_REAL_PID $RSYNC_PORT do sleep 0.2 done
Is it possible that one these loops never terminates?
If I run the same commands invoked by mysql manually, they terminate with errors because SST_PROGRESS_FILE and WSREP_SST_OPT_ROLE aren't defined but this is probably not significant.
I'll be happy to read any idea you might have. Maybe I should also send this message to galera mailing list?
Cheers,
- -- Sylvain Raybaud www.green-communications.fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQIcBAEBAgAGBQJVi9d9AAoJEEkkwl4JtJ9yE6EP/3PVzEXO7KVX9UrKrQSLXXM/ EWuV2a2KKxB8GkH1uDBISrn84futPR6M7/gdbMV8DeRNWAbCWgXWkrY/HVALvLxu jze4znZMmn+mxqDzmhp1klon+w9WAyH7lSfIC4AGgSiAe6ZFP5c8CSfbLNJPlSM+ buoJTQz9BFo+NTh4w7gbZUjAJVEm/7CpM8TwMPsu7+mMvbn4yMMLU2RkjhysGGKI YAJ9dXYjmD+49Z3Z2B54lRGeIbZNMDRjlQ/+F61Ml9XbQYJtDb8gHGN5a4FIiaFh pvOtjCi5foXdGfGv/nPbSkrMqkKBJHL0xbnKeze/OdIdZ1aOVN7Sy5tZJirlCJV3 q/48hmrIIMVU1FuC9F0zW7lGHsgsKiQvLM0X8C/ZdX0ZTvTjWWefoWLa/fSv/vYp z9E5HJxJ47gbnGeSEBby1+TmV/GoXvjiyH3oRXzEuyEP5xQtKH4Kvrh9MhwZdZpv sJc5QO9qfAHPecavhRQ1luuv6JCSqLPGpF+3RB80rB+BRUkwi0dfO0RmBzzmKBGZ vryGxAVBNZmpIatuCWY4cvvrkQm9kF7JMAR1rNTRylxLtyI5TR42c5r24tSasWct P6gICSrMBKVgoCdxanTI2q+faqAur/NDq4W6tgeqRnV/mk9zQW4oxJxfwQlS0zr6 YiUMXGeRi6IT0MTKwChk =goS7 -----END PGP SIGNATURE-----
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guillaume, On 25/06/2015 13:40, Guillaume Lefranc wrote:
Just a suggestion, you can try adding "set -x" to /usr/bin/wsrep_sst_rsync, so the script will dump its output in the log. You should be able to know where it hangs precisely then.
It gives me the following sequence repating forever: + check_pid_and_port /var/lib/mysql//rsync_sst.pid 20189 4444 + local pid_file=/var/lib/mysql//rsync_sst.pid + local rsync_pid=20189 + local rsync_port=4444 + which lsof ++ lsof -i :4444 -Pn ++ grep '(LISTEN)' + local port_info= ++ echo ++ grep -w '^rsync[[:space:]]\+20189' + local is_rsync= + '[' -n '' -a -z '' ']' It seems to correspond to lines 281--284: until check_pid_and_port $RSYNC_PID $RSYNC_REAL_PID $RSYNC_PORT do sleep 0.2 done check_pid_and_port seems to be checking that rsync is running and listening, basically. Strange thing is: lsof -i :4444 -Pn doesn't return anything although ps shows that rsync was invoked correctly: rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf Actually, lsof -i :4444 -Pn seems to behave rather differently on my laptop (ubuntu) and on buildroot. Indeed, lsof in buildroot is provided by busybox by default. This sometimes leads to significant differences. I'm going to rebuild my system with the real lsof package and see if it gets better. I'll let you know.
You can also try to run the SST command manually on the nodes, and see what it does. You can get the full command output in ps so you're free to start a donor on one node and a joiner on another node and follow the script output.
I did, and it fails because some variables are unbound. I think this is specific to manual invokation. - -- Sylvain Raybaud www.green-communications.fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVi/bfAAoJEEkkwl4JtJ9y2VQP/08Fa6NVtKwH2tlgJN5TOyqB TNKOEqvSW9LFIjCSpAbxANqkWRmHPbSfoxwK9Komqtp4B4YdIkHSDMxIgx5e8j+W VPJL9nLqB26g+jn3KkKS1N/uqK5O5DwUCBxT3+XtQkk0PpboemIQeE/pKpqoWCtG BMYEcFxO8pj91ICyy8dGHwQfmza5jLjnWQmJoqDlxNJs42YnDAnYVAIFgX6tsu0O lwWNHComc6dov+dEe6IgMow9sL9GyQBmeUg+jeB+7RpmtParPY9ISb0HRCwDQ9K/ 4nkgq9gJaCKUXnTwoIU8SIPLuEeXIqpBJNwQSpB6Sz9QlNGb11bH2y1Z/tEF/aQ9 rbp4F3VbWds66fJzBp9NbjqOP1/ZLkucwJN3EBVGqF8R2HfVMwGsjF3Mhz4iDajJ gOmKnAf3lgdyfA1zD0lGVqZlm+V/c13ODlDlutqpOJ7a6BUZ3rDEYtecLgAzEjNt eA7pBPXPARTS/ploYEHmUhFqw/oxRacnmScFrQgF0N/v3xal2CFSZD900oOKhs32 bfBTgqCvrCCrOSxWVV+4BdUUC+AS7S/pfQyJo1NyeK9aUy/wX+f7T3RtuZiyAzNM 9j2ewgM7WEYrLIJmEbBjYR4KYJ0EETc3Pw3di2E7k0tP8n790hSuDO3tYe8Y7Om2 1CyL46/Rd0HpSfyk9n81 =y4Fn -----END PGP SIGNATURE-----
Well, if you had said that your shell was busybox in the first place, that would have saved us a lot of time... :-) 2015-06-25 14:41 GMT+02:00 Sylvain Raybaud < sylvain.raybaud@green-communications.fr>:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Guillaume,
On 25/06/2015 13:40, Guillaume Lefranc wrote:
Just a suggestion, you can try adding "set -x" to /usr/bin/wsrep_sst_rsync, so the script will dump its output in the log. You should be able to know where it hangs precisely then.
It gives me the following sequence repating forever:
+ check_pid_and_port /var/lib/mysql//rsync_sst.pid 20189 4444 + local pid_file=/var/lib/mysql//rsync_sst.pid + local rsync_pid=20189 + local rsync_port=4444 + which lsof ++ lsof -i :4444 -Pn ++ grep '(LISTEN)' + local port_info= ++ echo ++ grep -w '^rsync[[:space:]]\+20189' + local is_rsync= + '[' -n '' -a -z '' ']'
It seems to correspond to lines 281--284: until check_pid_and_port $RSYNC_PID $RSYNC_REAL_PID $RSYNC_PORT do sleep 0.2 done
check_pid_and_port seems to be checking that rsync is running and listening, basically. Strange thing is: lsof -i :4444 -Pn doesn't return anything although ps shows that rsync was invoked correctly: rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf
Actually, lsof -i :4444 -Pn seems to behave rather differently on my laptop (ubuntu) and on buildroot. Indeed, lsof in buildroot is provided by busybox by default. This sometimes leads to significant differences. I'm going to rebuild my system with the real lsof package and see if it gets better. I'll let you know.
You can also try to run the SST command manually on the nodes, and see what it does. You can get the full command output in ps so you're free to start a donor on one node and a joiner on another node and follow the script output.
I did, and it fails because some variables are unbound. I think this is specific to manual invokation.
- -- Sylvain Raybaud www.green-communications.fr
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQIcBAEBAgAGBQJVi/bfAAoJEEkkwl4JtJ9y2VQP/08Fa6NVtKwH2tlgJN5TOyqB TNKOEqvSW9LFIjCSpAbxANqkWRmHPbSfoxwK9Komqtp4B4YdIkHSDMxIgx5e8j+W VPJL9nLqB26g+jn3KkKS1N/uqK5O5DwUCBxT3+XtQkk0PpboemIQeE/pKpqoWCtG BMYEcFxO8pj91ICyy8dGHwQfmza5jLjnWQmJoqDlxNJs42YnDAnYVAIFgX6tsu0O lwWNHComc6dov+dEe6IgMow9sL9GyQBmeUg+jeB+7RpmtParPY9ISb0HRCwDQ9K/ 4nkgq9gJaCKUXnTwoIU8SIPLuEeXIqpBJNwQSpB6Sz9QlNGb11bH2y1Z/tEF/aQ9 rbp4F3VbWds66fJzBp9NbjqOP1/ZLkucwJN3EBVGqF8R2HfVMwGsjF3Mhz4iDajJ gOmKnAf3lgdyfA1zD0lGVqZlm+V/c13ODlDlutqpOJ7a6BUZ3rDEYtecLgAzEjNt eA7pBPXPARTS/ploYEHmUhFqw/oxRacnmScFrQgF0N/v3xal2CFSZD900oOKhs32 bfBTgqCvrCCrOSxWVV+4BdUUC+AS7S/pfQyJo1NyeK9aUy/wX+f7T3RtuZiyAzNM 9j2ewgM7WEYrLIJmEbBjYR4KYJ0EETc3Pw3di2E7k0tP8n790hSuDO3tYe8Y7Om2 1CyL46/Rd0HpSfyk9n81 =y4Fn -----END PGP SIGNATURE-----
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 25/06/2015 14:42, Guillaume Lefranc wrote:
Well, if you had said that your shell was busybox in the first place, that would have saved us a lot of time... :-)
It's not my shell, my shell is regular bash, but many other tools, yes. Connecting to mysql daemon still fails with the same error, but at least lsof is now working as expected. That's progress :)
2015-06-25 14:41 GMT+02:00 Sylvain Raybaud <sylvain.raybaud@green-communications.fr <mailto:sylvain.raybaud@green-communications.fr>>:
Guillaume,
On 25/06/2015 13:40, Guillaume Lefranc wrote:
Just a suggestion, you can try adding "set -x" to /usr/bin/wsrep_sst_rsync, so the script will dump its output in the log. You should be able to know where it hangs precisely then.
It gives me the following sequence repating forever:
+ check_pid_and_port /var/lib/mysql//rsync_sst.pid 20189 4444 + local pid_file=/var/lib/mysql//rsync_sst.pid + local rsync_pid=20189 + local rsync_port=4444 + which lsof ++ lsof -i :4444 -Pn ++ grep '(LISTEN)' + local port_info= ++ echo ++ grep -w '^rsync[[:space:]]\+20189' + local is_rsync= + '[' -n '' -a -z '' ']'
It seems to correspond to lines 281--284: until check_pid_and_port $RSYNC_PID $RSYNC_REAL_PID $RSYNC_PORT do sleep 0.2 done
check_pid_and_port seems to be checking that rsync is running and listening, basically. Strange thing is: lsof -i :4444 -Pn doesn't return anything although ps shows that rsync was invoked correctly: rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf
Actually, lsof -i :4444 -Pn seems to behave rather differently on my laptop (ubuntu) and on buildroot. Indeed, lsof in buildroot is provided by busybox by default. This sometimes leads to significant differences. I'm going to rebuild my system with the real lsof package and see if it gets better. I'll let you know.
You can also try to run the SST command manually on the nodes, and see what it does. You can get the full command output in ps so you're free to start a donor on one node and a joiner on another node and follow the script output.
I did, and it fails because some variables are unbound. I think this is specific to manual invokation.
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net <mailto:maria-discuss@lists.launchpad.net> Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
- -- Sylvain Raybaud www.green-communications.fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVi/9wAAoJEEkkwl4JtJ9yvKIQAKFRN2S1PVMi4WpRkAy+y3Sx V8SkQdI2ngOmwmC+QoEIZYkqeASHamVItMtHM+t7OrPrVHDCTgOjrcksHt8LPSu8 EH/e4nScB5x12aCSxrgiet2FA4xmLVl24dUWRwtlR3Cm0H2bkUO2hJBw6YEP+FcD ATVrqfzr/kcnFAIW0X//6PtH562lc0FpMqkLiez/03iiZPEhf7LlblNbjr9jwHpv F2p6VSJZF+gdGVES/CKGWI0mSwzqrAjXV77ExxDR6zJ8WMVDrzcuF/IjEfjG/3Wu lu+GQBnNu3c+F7ROXp0R7HJYXdv7yrphp5v3JkZJ48m245gYvpz/d/R915ZPj1g5 8UbkKkWVJq1MwRQSoXf3vT1nbwOfghk5H9/YP5F/dgLXMHfSuQmRk2w2XOs0eYtt Y0pfQNT5KY3WGM4t3H+lss+c1tVYg/kA5ZFytCICbfJJJrbzSg/t0C6FcQ8JNeoY HIYuM/K3K/Nngdivi9RCNOxWiB0JwhWqUOAy4eTrI1LDXv1ySYBSllGgKZF77iS/ PI+MUEbUSfFyh/aFQRHGLzFY6qFrwGXf/n3PcmxrnGcBW3jqV8jZsBf0jvmUQhAw H+F8QImrpfYcaANo52ufi7bsUzjFSuo0jeib+FHoLeQXkqc10SXBZb+LVeuPKR7K fnhLWNkhcUF5YQ3IJK9F =q2jK -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all Finally I managed to get everything up and running by using regular (i.e. non busybox) versions of: * sleep * ps * xargs Hope this helps. Next steps are, not necessarily in this order: * propose patches for optional component that do not build in buildroot * create a buildroot package for xtrabackup * package and use galera arbitrator * create scripts for automating cluster initialisation and node joining (the idea is to have mariadb galera cluster working in a very unstable network) I'll let you know. Cheers, Sylvain On 25/06/2015 15:17, Sylvain Raybaud wrote:
On 25/06/2015 14:42, Guillaume Lefranc wrote:
Well, if you had said that your shell was busybox in the first place, that would have saved us a lot of time... :-)
It's not my shell, my shell is regular bash, but many other tools, yes.
Connecting to mysql daemon still fails with the same error, but at least lsof is now working as expected. That's progress :)
2015-06-25 14:41 GMT+02:00 Sylvain Raybaud <sylvain.raybaud@green-communications.fr <mailto:sylvain.raybaud@green-communications.fr>>:
Guillaume,
On 25/06/2015 13:40, Guillaume Lefranc wrote:
Just a suggestion, you can try adding "set -x" to /usr/bin/wsrep_sst_rsync, so the script will dump its output in the log. You should be able to know where it hangs precisely then.
It gives me the following sequence repating forever:
+ check_pid_and_port /var/lib/mysql//rsync_sst.pid 20189 4444 + local pid_file=/var/lib/mysql//rsync_sst.pid + local rsync_pid=20189 + local rsync_port=4444 + which lsof ++ lsof -i :4444 -Pn ++ grep '(LISTEN)' + local port_info= ++ echo ++ grep -w '^rsync[[:space:]]\+20189' + local is_rsync= + '[' -n '' -a -z '' ']'
It seems to correspond to lines 281--284: until check_pid_and_port $RSYNC_PID $RSYNC_REAL_PID $RSYNC_PORT do sleep 0.2 done
check_pid_and_port seems to be checking that rsync is running and listening, basically. Strange thing is: lsof -i :4444 -Pn doesn't return anything although ps shows that rsync was invoked correctly: rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf
Actually, lsof -i :4444 -Pn seems to behave rather differently on my laptop (ubuntu) and on buildroot. Indeed, lsof in buildroot is provided by busybox by default. This sometimes leads to significant differences. I'm going to rebuild my system with the real lsof package and see if it gets better. I'll let you know.
You can also try to run the SST command manually on the nodes, and see what it does. You can get the full command output in ps so you're free to start a donor on one node and a joiner on another node and follow the script output.
I did, and it fails because some variables are unbound. I think this is specific to manual invokation.
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net <mailto:maria-discuss@lists.launchpad.net> Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
- -- Sylvain Raybaud www.green-communications.fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVk/xeAAoJEEkkwl4JtJ9yEn4QALLrpa1/oUKyRTt0kwGu61+T dF7R87o+7h0ecK5qskzUHDanUNfioUAqdZVEbGN402YzFM/jAFNpV0yuDYWO44VC YUeMgiPggNso3u6ySl6BaqoFJO/cXYFf4J/dOTMb/OT6/PXn3Ldcxd4I52uWDoNf n1KuplyZyJF18NE0aOL88/5CCeHgmzrGCB2fiT76UBtikpg0e2jhN8UfJLAFu0TK Q2Afe073OSGhOmsh+gh21vo2iWhad0rLcu7betmTeo7D6sKTDMmfGyvJoRTKgiS9 /o7S7Am0CXtGN2xO54FOozH51ir/KTw8u7nnHzCMI3ZaGAQpumIl2zymGQ9dCA3Y qUs2I7Gp2P3eF84cdlXR3d3pVX0rGrA4Mw3A32Cpt7lEQmizXlOLLNpzqd2WRYs8 Jz/xVSXT17hFBdCkzuUvbCui85BKHm/Hy7zlDoX1NKrRMGwJow14xeniSFb6OoqW nQbGMdbPdep5DpABFd5xb00ESl/Zw/kbKuUZs+T/cnVZ/XnLxMO3oyV21JgfPsZr 4mYwFQBpdqMU2zEGULioCiNkGwyLbAU7QnCGOzm04IJh5z30oNjsu7zwEGSeQxOG lcgl9sOdgp8K8FijihJgidOZBDR2487smPJtWtpU9rpbEvS9iFJO52J7VrZ6Qzph bP+YkhvOxqX9PnTgywY2 =OBu2 -----END PGP SIGNATURE-----
participants (2)
-
Guillaume Lefranc
-
Sylvain Raybaud