[Maria-discuss] Rolling upgrade to Galera 10.4 not working
Hello, I'm trying to do a rolling upgrade of a MariaDB 10.3 Galera cluster to to 10.4, following the instructions here <https://mariadb.com/kb/en/library/upgrading-from-mariadb-103-to-mariadb-104-with-galera-cluster/>. (I had to adjust the wsrep_provider variable for Galera 4 to /usr/lib64/galera-4/libgalera_smm.so, not mentioned in the instructions). After I start the mariadb service, one mysqld thread uses 100% of one core for about 45 minutes, then I see the following in the log and mariadb restarts, repeating the same process: Jun 20 16:24:44 rubidium mysqld: 2019-06-20 16:24:44 0 [Note] WSREP: Trying to continue unpaused monitor Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: REPL Protocols: 9 (4, 2) Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: ####### My UUID: ddb4f4da-935f-11e9-bcff-8209857266f0 Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server mariadb4 connected to cluster at position 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 with ID ddb4f4da-935f-11e9-bcff-8209857266f0 Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server status change disconnected -> connected Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: State transfer required: Jun 20 16:24:47 rubidium mysqld: Group state: 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 Jun 20 16:24:47 rubidium mysqld: Local state: 6a1fd0aa-4739-11e5-af88-42756e861245:7403368279 Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server status change connected -> joiner Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.100.130.24' --datadir '/var/lib/mysql/' --parent '21283' '' ''' Jun 20 16:24:48 rubidium rsyncd[26235]: rsyncd version 3.1.2 starting, listening on port 4444 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Prepared SST request: rsync|10.100.130.24:4444/rsync_sst Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Cert index reset to 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 (proto: 9), state transfer needed: yes Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: Service thread queue flushed. Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: ####### Assign initial position for certification: 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644, protocol version: 4 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: ####### IST uuid:6a1fd0aa-4739-11e5-af88-42756e861245 f: 7403368280, l: 7403466644, STRv: 2 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: IST receiver addr using tcp://10.100.130.24:4568 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Prepared IST receiver for 7403368280-7403466644, listening at: tcp:// 10.100.130.24:4568 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: Member 2.0 (mariadb4) requested state transfer from 'mariadb2'. Selected 1.0 (mariadb2)(SYNCED) as donor. Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 7403946566) Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP: Slave queue grew too long while trying to request state transfer 1 time(s). Please make sure that there is at least one fully synced member in the group. Application must be restarted. Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP: State transfer request failed unrecoverably: 35 (Resource deadlock avoided). Most likely it is due to inability to communicate with the cluster primary component. Restart required. On the donor, I see: 2019-06-20 16:24:44 0 [Note] WSREP: Trying to continue unpaused monitor 2019-06-20 16:24:48 0 [Note] WSREP: Member 2.0 (mariadb4) requested state transfer from 'mariadb2'. Selected 1.0 (mariadb2)(SYNCED) as donor. 2019-06-20 16:24:48 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 7403946566) 2019-06-20 16:24:48 18 [Note] WSREP: IST request: 6a1fd0aa-4739-11e5-af88-42756e861245:7403368279-7403466644|tcp:// 10.100.130.24:4568 2019-06-20 16:24:48 18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2019-06-20 16:24:48 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass' 2019-06-20 16:24:48 18 [Note] WSREP: sst_donor_thread signaled with 0 2019-06-20 16:24:48 0 [Note] WSREP: async IST sender starting to serve tcp://10.100.130.24:4568 sending 7403368280-7403466644 2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.100.130.24:4567 2019-06-20 16:24:48 0 [Note] WSREP: declaring 116c0b4c at tcp:// 10.100.130.22:4567 stable 2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp:// 10.100.130.24:4567) 2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567') turning message relay requesting off 2019-06-20 16:24:48 0 [Note] WSREP: Node 116c0b4c state prim 2019-06-20 16:24:48 0 [Note] WSREP: view(view_id(PRIM,116c0b4c,297) memb { 116c0b4c,0 700b363f,0 } joined { } left { } partitioned { ddb4f4da,0 }) 2019-06-20 16:24:48 0 [Note] WSREP: save pc into disk 2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp:// 10.100.130.24:4567) 2019-06-20 16:24:48 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 286f5ce4-9367-11e9-acee-4e4dec61d7c1 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg: 286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 0 (mariadb1) 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg: 286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 1 (mariadb2) 2019-06-20 16:24:48 0 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 269, members = 2/2 (joined/total), act_id = 7403946566, last_appl. = 7403946444, protocols = 0/9/3 (gcs/repl/appl), group UUID = 6a1fd0aa-4739-11e5-af88-42756e861245 2019-06-20 16:24:48 0 [Note] WSREP: Flow-control interval: [23, 23] 2019-06-20 16:24:48 0 [Note] WSREP: Trying to continue unpaused monitor 2019-06-20 16:24:48 14 [Note] WSREP: New cluster view: global state: 6a1fd0aa-4739-11e5-af88-42756e861245:7403946566, view# 270: Primary, number of nodes: 2, my index: 1, protocol version 3 2019-06-20 16:24:48 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2019-06-20 16:24:48 14 [Note] WSREP: REPL Protocols: 9 (4, 2) 2019-06-20 16:24:48 14 [Note] WSREP: Assign initial position for certification: 7403946566, protocol version: 4 2019-06-20 16:24:48 0 [Note] WSREP: Service thread queue flushed. 2019-06-20 16:24:48 0 [ERROR] WSREP: async IST sender failed to serve tcp:// 10.100.130.24:4568: ist send failed: asio.system:104', asio error 'read: Connection reset by peer': 104 (Connection reset by peer) at galera/src/ist.cpp:send():790 2019-06-20 16:24:48 0 [Note] WSREP: async IST sender served WSREP_SST: [INFO] Bypassing state dump. (20190620 16:24:48.149) rsync: safe_read failed to read 1 bytes [sender]: Connection reset by peer (104) rsync error: error in rsync protocol data stream (code 12) at io.c(276) [sender=3.1.2] 2019-06-20 16:24:48 0 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass 2019-06-20 16:24:48 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass: 12 (Cannot allocate memory) 2019-06-20 16:24:48 0 [ERROR] WSREP: Command did not run: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass 2019-06-20 16:24:48 0 [Warning] WSREP: Could not find peer: ddb4f4da-935f-11e9-bcff-8209857266f0 2019-06-20 16:24:48 0 [Warning] WSREP: 1.0 (mariadb2): State transfer to -1.-1 (left the group) failed: -12 (Cannot allocate memory) What is going on?
Anyone? On Thu, Jun 20, 2019 at 5:59 PM Tiemen Ruiten <t.ruiten@tech-lab.io> wrote:
Hello,
I'm trying to do a rolling upgrade of a MariaDB 10.3 Galera cluster to to 10.4, following the instructions here <https://mariadb.com/kb/en/library/upgrading-from-mariadb-103-to-mariadb-104-with-galera-cluster/>. (I had to adjust the wsrep_provider variable for Galera 4 to /usr/lib64/galera-4/libgalera_smm.so, not mentioned in the instructions). After I start the mariadb service, one mysqld thread uses 100% of one core for about 45 minutes, then I see the following in the log and mariadb restarts, repeating the same process:
Jun 20 16:24:44 rubidium mysqld: 2019-06-20 16:24:44 0 [Note] WSREP: Trying to continue unpaused monitor Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: REPL Protocols: 9 (4, 2) Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: ####### My UUID: ddb4f4da-935f-11e9-bcff-8209857266f0 Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server mariadb4 connected to cluster at position 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 with ID ddb4f4da-935f-11e9-bcff-8209857266f0 Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server status change disconnected -> connected Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: State transfer required: Jun 20 16:24:47 rubidium mysqld: Group state: 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 Jun 20 16:24:47 rubidium mysqld: Local state: 6a1fd0aa-4739-11e5-af88-42756e861245:7403368279 Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server status change connected -> joiner Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.100.130.24' --datadir '/var/lib/mysql/' --parent '21283' '' ''' Jun 20 16:24:48 rubidium rsyncd[26235]: rsyncd version 3.1.2 starting, listening on port 4444 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Prepared SST request: rsync|10.100.130.24:4444/rsync_sst Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Cert index reset to 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 (proto: 9), state transfer needed: yes Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: Service thread queue flushed. Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: ####### Assign initial position for certification: 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644, protocol version: 4 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: ####### IST uuid:6a1fd0aa-4739-11e5-af88-42756e861245 f: 7403368280, l: 7403466644, STRv: 2 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: IST receiver addr using tcp://10.100.130.24:4568 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Prepared IST receiver for 7403368280-7403466644, listening at: tcp:// 10.100.130.24:4568 Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: Member 2.0 (mariadb4) requested state transfer from 'mariadb2'. Selected 1.0 (mariadb2)(SYNCED) as donor. Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 7403946566) Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP: Slave queue grew too long while trying to request state transfer 1 time(s). Please make sure that there is at least one fully synced member in the group. Application must be restarted. Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP: State transfer request failed unrecoverably: 35 (Resource deadlock avoided). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
On the donor, I see:
2019-06-20 16:24:44 0 [Note] WSREP: Trying to continue unpaused monitor 2019-06-20 16:24:48 0 [Note] WSREP: Member 2.0 (mariadb4) requested state transfer from 'mariadb2'. Selected 1.0 (mariadb2)(SYNCED) as donor. 2019-06-20 16:24:48 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 7403946566) 2019-06-20 16:24:48 18 [Note] WSREP: IST request: 6a1fd0aa-4739-11e5-af88-42756e861245:7403368279-7403466644|tcp:// 10.100.130.24:4568 2019-06-20 16:24:48 18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2019-06-20 16:24:48 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass' 2019-06-20 16:24:48 18 [Note] WSREP: sst_donor_thread signaled with 0 2019-06-20 16:24:48 0 [Note] WSREP: async IST sender starting to serve tcp://10.100.130.24:4568 sending 7403368280-7403466644 2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp:// 10.100.130.24:4567 2019-06-20 16:24:48 0 [Note] WSREP: declaring 116c0b4c at tcp:// 10.100.130.22:4567 stable 2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp:// 10.100.130.24:4567) 2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567') turning message relay requesting off 2019-06-20 16:24:48 0 [Note] WSREP: Node 116c0b4c state prim 2019-06-20 16:24:48 0 [Note] WSREP: view(view_id(PRIM,116c0b4c,297) memb { 116c0b4c,0 700b363f,0 } joined { } left { } partitioned { ddb4f4da,0 }) 2019-06-20 16:24:48 0 [Note] WSREP: save pc into disk 2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp:// 10.100.130.24:4567) 2019-06-20 16:24:48 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 286f5ce4-9367-11e9-acee-4e4dec61d7c1 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg: 286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 0 (mariadb1) 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg: 286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 1 (mariadb2) 2019-06-20 16:24:48 0 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 269, members = 2/2 (joined/total), act_id = 7403946566, last_appl. = 7403946444, protocols = 0/9/3 (gcs/repl/appl), group UUID = 6a1fd0aa-4739-11e5-af88-42756e861245 2019-06-20 16:24:48 0 [Note] WSREP: Flow-control interval: [23, 23] 2019-06-20 16:24:48 0 [Note] WSREP: Trying to continue unpaused monitor 2019-06-20 16:24:48 14 [Note] WSREP: New cluster view: global state: 6a1fd0aa-4739-11e5-af88-42756e861245:7403946566, view# 270: Primary, number of nodes: 2, my index: 1, protocol version 3 2019-06-20 16:24:48 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2019-06-20 16:24:48 14 [Note] WSREP: REPL Protocols: 9 (4, 2) 2019-06-20 16:24:48 14 [Note] WSREP: Assign initial position for certification: 7403946566, protocol version: 4 2019-06-20 16:24:48 0 [Note] WSREP: Service thread queue flushed. 2019-06-20 16:24:48 0 [ERROR] WSREP: async IST sender failed to serve tcp://10.100.130.24:4568: ist send failed: asio.system:104', asio error 'read: Connection reset by peer': 104 (Connection reset by peer) at galera/src/ist.cpp:send():790 2019-06-20 16:24:48 0 [Note] WSREP: async IST sender served WSREP_SST: [INFO] Bypassing state dump. (20190620 16:24:48.149) rsync: safe_read failed to read 1 bytes [sender]: Connection reset by peer (104) rsync error: error in rsync protocol data stream (code 12) at io.c(276) [sender=3.1.2] 2019-06-20 16:24:48 0 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass 2019-06-20 16:24:48 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass: 12 (Cannot allocate memory) 2019-06-20 16:24:48 0 [ERROR] WSREP: Command did not run: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass 2019-06-20 16:24:48 0 [Warning] WSREP: Could not find peer: ddb4f4da-935f-11e9-bcff-8209857266f0 2019-06-20 16:24:48 0 [Warning] WSREP: 1.0 (mariadb2): State transfer to -1.-1 (left the group) failed: -12 (Cannot allocate memory)
What is going on?
On Thu, 20 Jun 2019 17:59:26 +0200 Tiemen Ruiten <t.ruiten@tech-lab.io> wrote:
On the donor, I see:
... 2019-06-20 16:24:48 0 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass 2019-06-20 16:24:48 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass: 12 (Cannot allocate memory) 2019-06-20 16:24:48 0 [ERROR] WSREP: Command did not run: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass 2019-06-20 16:24:48 0 [Warning] WSREP: Could not find peer: ddb4f4da-935f-11e9-bcff-8209857266f0 2019-06-20 16:24:48 0 [Warning] WSREP: 1.0 (mariadb2): State transfer to -1.-1 (left the group) failed: -12 (Cannot allocate memory)
What is going on?
The last lines are the biggest hint - Cannot allocate memory on the donor.
I saw that, but please note that this only happens 45 minutes after I start the mariadb service while attempting an SST (because the IST failed?). All that time I see one mysqld thread spinning, but nothing in the logs. To be sure I increased RAM on the donor to 32G so there shouldn't be any issues allocating memory. On Fri, Jun 28, 2019 at 2:07 AM Daniel Black <daniel@linux.ibm.com> wrote:
On Thu, 20 Jun 2019 17:59:26 +0200 Tiemen Ruiten <t.ruiten@tech-lab.io> wrote:
On the donor, I see:
... 2019-06-20 16:24:48 0 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass 2019-06-20 16:24:48 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass: 12 (Cannot allocate memory) 2019-06-20 16:24:48 0 [ERROR] WSREP: Command did not run: wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0' --bypass 2019-06-20 16:24:48 0 [Warning] WSREP: Could not find peer: ddb4f4da-935f-11e9-bcff-8209857266f0 2019-06-20 16:24:48 0 [Warning] WSREP: 1.0 (mariadb2): State transfer to -1.-1 (left the group) failed: -12 (Cannot allocate memory)
What is going on?
The last lines are the biggest hint - Cannot allocate memory on the donor.
participants (2)
-
Daniel Black
-
Tiemen Ruiten