[Maria-discuss] 回复:Re:__转发:failed_to_connect_mariadb_at_addition_node_in_MariaDB_galera_cluster
Hi, Thanks for the reponse. No, I started mysqld on the second node via comamd "mysqld --defaults-extra-file=mdb.my.cnf --debug". The problem is permanent. No abnormal output seen from the console. I feel mariadb may stuck somewher. from the trace log(attached mdb.mysqld.trace). it looks stuck at wsrep_replication_process. It may wait for some message from primary node. but I have no ieda what it waits for. -------------------------------- ----- 原始邮件 ----- 发件人:Nirbhay Choubey <nirbhay@mariadb.com> 收件人:zh1029@sina.com 抄送人:maria-discuss <maria-discuss@lists.launchpad.net>, "yan-jack.chen" <yan-jack.chen@nokia.com> 主题:Re:_[Maria-discuss]_转发:failed_to_connect_mariadb_at_addition_node_in_MariaDB_galera_cluster 日期:2016年09月02日 23点26分 Hi, On Thu, Sep 1, 2016 at 11:08 PM, 西门吹牛 <zh1029@sina.com> wrote: Hi, I deployed two MariaDB galera version in two nodes to build cluster. But I can’t connect MariaDB in second node as seems the port is not created by MariaDB because looks like it is stuck somehow. Version: mysqld 10.1.17-MariaDB-debug VS galera-3-25.3.17 I started MariaDB in first node. Seems fine. Port 3307 was created and I can login Mariadb by mysql. You shouldn't use --wsrep-new-cluster to start the 2nd node (in case you are).Regarding the hang:Does it happen all the time? Repeatable?Is that all you see in the error log? Nothing after the partial last line?Will it be possible to attach mysqld to some debugger to check where exactly does it hang? Best,Nirbhay [root@MMN-0(RCP-69) /root/test] # /home/_rcpadmin/bin/mariadb/bin/mysqld --defaults-extra-file=./mmn.my.cnf --wsrep-new-cluster --debug 2016-09-01 16:41:37 140716544316288 [Note] /home/_rcpadmin/bin/mariadb/bin/mysqld (mysqld 10.1.17-MariaDB-debug) starting as process 15248 ... 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Setting wsrep_ready to 0 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Read nil XID from storage engines, skipping position init 2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so' 2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_load(): Galera 3.17(r0) by Codership Oy <info@codership.com> loaded successfully. 2016-09-01 16:41:38 140716544316288 [Note] WSREP: CRC-32C: using hardware acceleration. 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Found saved state: 900987cc-7003-11e6-b25f-de0a52317f1d:0 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Passing config to GCS: base_dir = /mariadb/; base_host = MMN-0; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /mariadb/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mariadb//galera.cache; gcache.page_size = 300M; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc 2016-09-01 16:41:38 140716029634304 [Note] WSREP: Service thread queue flushed. 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1 2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_sst_grab() 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Start replication 2016-09-01 16:41:38 140716544316288 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Setting initial position to 900987cc-7003-11e6-b25f-de0a52317f1d:0 2016-09-01 16:41:38 140716544316288 [Note] WSREP: protonet asio version 0 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Using CRC-32C for message checksums. 2016-09-01 16:41:38 140716544316288 [Note] WSREP: backend: asio 2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm thread scheduling priority set to other:0 2016-09-01 16:41:38 140716544316288 [Warning] WSREP: access file(/mariadb//gvwstate.dat) failed(No such file or directory) 2016-09-01 16:41:38 140716544316288 [Note] WSREP: restore pc from disk failed 2016-09-01 16:41:38 140716544316288 [Note] WSREP: GMCast version 0 2016-09-01 16:41:38 140716544316288 [Note] WSREP: (e5a32d3c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2016-09-01 16:41:38 140716544316288 [Note] WSREP: (e5a32d3c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2016-09-01 16:41:38 140716544316288 [Note] WSREP: EVS version 0 2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm: bootstrapping new group 'example_cluster' 2016-09-01 16:41:38 140716544316288 [Note] WSREP: start_prim is enabled, turn off pc_recovery 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Node e5a32d3c state prim 2016-09-01 16:41:38 140716544316288 [Note] WSREP: view(view_id(PRIM,e5a32d3c,1) memb { e5a32d3c,0 } joined { } left { } partitioned { }) 2016-09-01 16:41:38 140716544316288 [Note] WSREP: save pc into disk 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.4:4567 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b0352ef0 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.5:4567 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b035b720 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.6:4567 2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b0363ea0 2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm: connected 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Opened channel 'example_cluster' 2016-09-01 16:41:38 140715987670784 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1 2016-09-01 16:41:38 140716544316288 [Note] WSREP: Waiting for SST to complete. 2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE_EXCHANGE: sent state UUID: e5a3bc45-701f-11e6-ba1c-471590fea490 2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE EXCHANGE: sent state msg: e5a3bc45-701f-11e6-ba1c-471590fea490 2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE EXCHANGE: got state msg: e5a3bc45-701f-11e6-ba1c-471590fea490 from 0 (MMN-0) 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 0, members = 1/1 (joined/total), act_id = 0, last_appl. = -1, protocols = 0/7/3 (gcs/repl/appl), group UUID = 900987cc-7003-11e6-b25f-de0a52317f1d 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Flow-control interval: [16, 16] 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Restored state OPEN -> JOINED (0) 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Member 0.0 (MMN-0) synced with group. 2016-09-01 16:41:38 140716542806784 [Note] WSREP: New cluster view: global state: 900987cc-7003-11e6-b25f-de0a52317f1d:0, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3 2016-09-01 16:41:38 140715987670784 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0) 2016-09-01 16:41:38 140716544316288 [Note] WSREP: SST complete, seqno: 0 2016-09-01 16:41:38 140716544316288 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2016-09-01 16:41:38 140716544316288 [Note] InnoDB: InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!! 2016-09-01 16:41:38 140716544316288 [Note] InnoDB: InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!! 2016-09-01 16:41:38 140716544316288 [Note] InnoDB: The InnoDB memory heap is disabled 2016-09-01 16:41:38 140716 _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
Hi, On Fri, Sep 2, 2016 at 11:50 PM, 西门吹牛 <zh1029@sina.com> wrote:
Hi, Thanks for the reponse. No, I started mysqld on the second node via comamd "mysqld --defaults-extra-file=mdb.my.cnf --debug".
2016-09-01 16:41:38 140716544316288 [Note] WSREP: 'wsrep-new-cluster'
So the below log is from the first node? As I see following: option used, bootstrapping the cluster
The problem is permanent. No abnormal output seen from the console. I feel mariadb may stuck somewher. from the trace log(attached mdb.mysqld.trace). it looks stuck at wsrep_replication_process. It may wait for some message from primary node. but I have no ieda what it waits for.
T@4 : | | | | enter: buffer: WSREP: Gap in state sequence. Need state
Apparently, its waiting for SST, as indicated by last few lines from the attached trace: transfer. You can also use --wsrep-debug=ON to collect more details on both the nodes. Thanks, Nirbhay
--------------------------------
----- 原始邮件 ----- 发件人:Nirbhay Choubey <nirbhay@mariadb.com> 收件人:zh1029@sina.com 抄送人:maria-discuss <maria-discuss@lists.launchpad.net>, "yan-jack.chen" < yan-jack.chen@nokia.com> 主题:Re:_[Maria-discuss]_转发:failed_to_connect_mariadb_at_ addition_node_in_MariaDB_galera_cluster 日期:2016年09月02日 23点26分
Hi,
On Thu, Sep 1, 2016 at 11:08 PM, 西门吹牛 <zh1029@sina.com> wrote:
Hi, I deployed two MariaDB galera version in two nodes to build cluster. But I can’t connect MariaDB in second node as seems the port is not created by MariaDB because looks like it is stuck somehow. Version: mysqld 10.1.17-MariaDB-debug VS galera-3-25.3.17
I started MariaDB in first node. Seems fine. Port 3307 was created and I can login Mariadb by mysql.
You shouldn't use --wsrep-new-cluster to start the 2nd node (in case you are). Regarding the hang: Does it happen all the time? Repeatable? Is that all you see in the error log? Nothing after the partial last line? Will it be possible to attach mysqld to some debugger to check where exactly does it hang?
Best, Nirbhay
[root@MMN-0(RCP-69) /root/test]
# /home/_rcpadmin/bin/mariadb/bin/mysqld --defaults-extra-file=./mmn.my.cnf --wsrep-new-cluster --debug
2016-09-01 16:41:37 140716544316288 [Note] /home/_rcpadmin/bin/mariadb/bin/mysqld (mysqld 10.1.17-MariaDB-debug) starting as process 15248 ...
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Setting wsrep_ready to 0
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_load(): Galera 3.17(r0) by Codership Oy <info@codership.com> loaded successfully.
2016-09-01 16:41:38 140716544316288 [Note] WSREP: CRC-32C: using hardware acceleration.
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Found saved state: 900987cc-7003-11e6-b25f-de0a52317f1d:0
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Passing config to GCS: base_dir = /mariadb/; base_host = MMN-0; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /mariadb/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mariadb//galera.cache; gcache.page_size = 300M; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc
2016-09-01 16:41:38 140716029634304 [Note] WSREP: Service thread queue flushed.
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_sst_grab()
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Start replication
2016-09-01 16:41:38 140716544316288 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Setting initial position to 900987cc-7003-11e6-b25f-de0a52317f1d:0
2016-09-01 16:41:38 140716544316288 [Note] WSREP: protonet asio version 0
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Using CRC-32C for message checksums.
2016-09-01 16:41:38 140716544316288 [Note] WSREP: backend: asio
2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm thread scheduling priority set to other:0
2016-09-01 16:41:38 140716544316288 [Warning] WSREP: access file(/mariadb//gvwstate.dat) failed(No such file or directory)
2016-09-01 16:41:38 140716544316288 [Note] WSREP: restore pc from disk failed
2016-09-01 16:41:38 140716544316288 [Note] WSREP: GMCast version 0
2016-09-01 16:41:38 140716544316288 [Note] WSREP: (e5a32d3c, 'tcp:// 0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-09-01 16:41:38 140716544316288 [Note] WSREP: (e5a32d3c, 'tcp:// 0.0.0.0:4567') multicast: , ttl: 1
2016-09-01 16:41:38 140716544316288 [Note] WSREP: EVS version 0
2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm: bootstrapping new group 'example_cluster'
2016-09-01 16:41:38 140716544316288 [Note] WSREP: start_prim is enabled, turn off pc_recovery
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Node e5a32d3c state prim
2016-09-01 16:41:38 140716544316288 [Note] WSREP: view(view_id(PRIM,e5a32d3c,1) memb {
e5a32d3c,0
} joined {
} left {
} partitioned {
})
2016-09-01 16:41:38 140716544316288 [Note] WSREP: save pc into disk
2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.4:4567
2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b0352ef0
2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.5:4567
2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b035b720
2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.6:4567
2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b0363ea0
2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm: connected
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Opened channel 'example_cluster'
2016-09-01 16:41:38 140715987670784 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2016-09-01 16:41:38 140716544316288 [Note] WSREP: Waiting for SST to complete.
2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE_EXCHANGE: sent state UUID: e5a3bc45-701f-11e6-ba1c-471590fea490
2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE EXCHANGE: sent state msg: e5a3bc45-701f-11e6-ba1c-471590fea490
2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE EXCHANGE: got state msg: e5a3bc45-701f-11e6-ba1c-471590fea490 from 0 (MMN-0)
2016-09-01 16:41:38 140715987670784 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 0,
members = 1/1 (joined/total),
act_id = 0,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 900987cc-7003-11e6-b25f-de0a52317f1d
2016-09-01 16:41:38 140715987670784 [Note] WSREP: Flow-control interval: [16, 16]
2016-09-01 16:41:38 140715987670784 [Note] WSREP: Restored state OPEN -> JOINED (0)
2016-09-01 16:41:38 140715987670784 [Note] WSREP: Member 0.0 (MMN-0) synced with group.
2016-09-01 16:41:38 140716542806784 [Note] WSREP: New cluster view: global state: 900987cc-7003-11e6-b25f-de0a52317f1d:0, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
2016-09-01 16:41:38 140715987670784 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
2016-09-01 16:41:38 140716544316288 [Note] WSREP: SST complete, seqno: 0
2016-09-01 16:41:38 140716544316288 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-09-01 16:41:38 140716544316288 [Note] InnoDB: InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2016-09-01 16:41:38 140716544316288 [Note] InnoDB: InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!
2016-09-01 16:41:38 140716544316288 [Note] InnoDB: The InnoDB memory heap is disabled
2016-09-01 16:41:38 140716 _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp
participants (2)
-
Nirbhay Choubey
-
西门吹牛