Hi,

On Fri, Sep 2, 2016 at 11:50 PM, 西门吹牛 <zh1029@sina.com> wrote:
Hi,
  Thanks for the reponse.
  No, I started mysqld on the second node via comamd "mysqld --defaults-extra-file=mdb.my.cnf --debug".

So the below log is from the first node? As I see following:

> 2016-09-01 16:41:38 140716544316288 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster

 
  The problem is permanent.
  No abnormal output seen from the console. I feel mariadb may stuck somewher. from the trace log(attached mdb.mysqld.trace). it looks stuck at wsrep_replication_process. It may wait for some message from primary node. but I have no ieda what it waits for.

Apparently, its waiting for SST, as indicated by last few lines from the attached trace:

> T@4    : | | | | enter: buffer: WSREP: Gap in state sequence. Need state transfer.

You can also use --wsrep-debug=ON to collect more details on both the nodes.

Thanks,
Nirbhay

 

--------------------------------


----- 原始邮件 -----
发件人:Nirbhay Choubey <nirbhay@mariadb.com>
收件人:zh1029@sina.com
抄送人:maria-discuss <maria-discuss@lists.launchpad.net>, "yan-jack.chen" <yan-jack.chen@nokia.com>
主题:Re:_[Maria-discuss]_转发:failed_to_connect_mariadb_at_addition_node_in_MariaDB_galera_cluster
日期:2016年09月02日 23点26分

Hi,

On Thu, Sep 1, 2016 at 11:08 PM, 西门吹牛 <zh1029@sina.com> wrote:
Hi,
  I deployed two MariaDB galera version in two nodes to build cluster. But I can’t connect MariaDB in second node as seems the port is not created by MariaDB because looks like it is stuck somehow.
  Version: mysqld 10.1.17-MariaDB-debug VS galera-3-25.3.17

I started MariaDB in first node. Seems fine. Port 3307 was created and I can login Mariadb by mysql.

You shouldn't use --wsrep-new-cluster to start the 2nd node (in case you are).
Regarding the hang:
Does it happen all the time? Repeatable?
Is that all you see in the error log? Nothing after the partial last line?
Will it be possible to attach mysqld to some debugger to check where exactly does it hang?

Best,
Nirbhay






[root@MMN-0(RCP-69) /root/test]

# /home/_rcpadmin/bin/mariadb/bin/mysqld --defaults-extra-file=./mmn.my.cnf --wsrep-new-cluster --debug

2016-09-01 16:41:37 140716544316288 [Note] /home/_rcpadmin/bin/mariadb/bin/mysqld (mysqld 10.1.17-MariaDB-debug) starting as process 15248 ...

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Setting wsrep_ready to 0

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Read nil XID from storage engines, skipping position init

2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'

2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_load(): Galera 3.17(r0) by Codership Oy <info@codership.com> loaded successfully.

2016-09-01 16:41:38 140716544316288 [Note] WSREP: CRC-32C: using hardware acceleration.

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Found saved state: 900987cc-7003-11e6-b25f-de0a52317f1d:0

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Passing config to GCS: base_dir = /mariadb/; base_host = MMN-0; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period
= PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout
= PT24H; gcache.dir = /mariadb/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mariadb//galera.cache; gcache.page_size = 300M; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave
= no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum
= false; pc.ignore_sb = false; pc

2016-09-01 16:41:38 140716029634304 [Note] WSREP: Service thread queue flushed.

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1

2016-09-01 16:41:38 140716544316288 [Note] WSREP: wsrep_sst_grab()

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Start replication

2016-09-01 16:41:38 140716544316288 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Setting initial position to 900987cc-7003-11e6-b25f-de0a52317f1d:0

2016-09-01 16:41:38 140716544316288 [Note] WSREP: protonet asio version 0

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Using CRC-32C for message checksums.

2016-09-01 16:41:38 140716544316288 [Note] WSREP: backend: asio

2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm thread scheduling priority set to other:0

2016-09-01 16:41:38 140716544316288 [Warning] WSREP: access file(/mariadb//gvwstate.dat) failed(No such file or directory)

2016-09-01 16:41:38 140716544316288 [Note] WSREP: restore pc from disk failed

2016-09-01 16:41:38 140716544316288 [Note] WSREP: GMCast version 0

2016-09-01 16:41:38 140716544316288 [Note] WSREP: (e5a32d3c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567

2016-09-01 16:41:38 140716544316288 [Note] WSREP: (e5a32d3c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1

2016-09-01 16:41:38 140716544316288 [Note] WSREP: EVS version 0

2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm: bootstrapping new group 'example_cluster'

2016-09-01 16:41:38 140716544316288 [Note] WSREP: start_prim is enabled, turn off pc_recovery

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Node e5a32d3c state prim

2016-09-01 16:41:38 140716544316288 [Note] WSREP: view(view_id(PRIM,e5a32d3c,1) memb {

        e5a32d3c,0

} joined {

} left {

} partitioned {

})

2016-09-01 16:41:38 140716544316288 [Note] WSREP: save pc into disk

2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.4:4567

2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b0352ef0

2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.5:4567

2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b035b720

2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr without UUID: tcp://169.254.0.6:4567

2016-09-01 16:41:38 140716544316288 [Note] WSREP: discarding pending addr proto entry 0x5652b0363ea0

2016-09-01 16:41:38 140716544316288 [Note] WSREP: gcomm: connected

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Opened channel 'example_cluster'

2016-09-01 16:41:38 140715987670784 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1

2016-09-01 16:41:38 140716544316288 [Note] WSREP: Waiting for SST to complete.

2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE_EXCHANGE: sent state UUID: e5a3bc45-701f-11e6-ba1c-471590fea490

2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE EXCHANGE: sent state msg: e5a3bc45-701f-11e6-ba1c-471590fea490

2016-09-01 16:41:38 140715987670784 [Note] WSREP: STATE EXCHANGE: got state msg: e5a3bc45-701f-11e6-ba1c-471590fea490 from 0 (MMN-0)

2016-09-01 16:41:38 140715987670784 [Note] WSREP: Quorum results:

        version    = 4,

        component  = PRIMARY,

        conf_id    = 0,

        members    = 1/1 (joined/total),

        act_id     = 0,

        last_appl. = -1,

        protocols  = 0/7/3 (gcs/repl/appl),

        group UUID = 900987cc-7003-11e6-b25f-de0a52317f1d

2016-09-01 16:41:38 140715987670784 [Note] WSREP: Flow-control interval: [16, 16]

2016-09-01 16:41:38 140715987670784 [Note] WSREP: Restored state OPEN -> JOINED (0)

2016-09-01 16:41:38 140715987670784 [Note] WSREP: Member 0.0 (MMN-0) synced with group.

2016-09-01 16:41:38 140716542806784 [Note] WSREP: New cluster view: global state: 900987cc-7003-11e6-b25f-de0a52317f1d:0, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3

2016-09-01 16:41:38 140715987670784 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)

2016-09-01 16:41:38 140716544316288 [Note] WSREP: SST complete, seqno: 0

2016-09-01 16:41:38 140716544316288 [Note] InnoDB: Using mutexes to ref count buffer pool pages

2016-09-01 16:41:38 140716544316288 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!

2016-09-01 16:41:38 140716544316288 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!

2016-09-01 16:41:38 140716544316288 [Note] InnoDB: The InnoDB memory heap is disabled

2016-09-01 16:41:38 140716
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp