[Maria-developers] Fix for TokuDB and parallel replication
Parallel replication so far did not work well with TokuDB, as some people who tried it found out. I have now pushed to 10.1 some patches to solve the problems. There are two main fixes: 1. Fix some races where a waiting transaction would miss its wakeup and get a lock timeout on a waiting row lock, even though the lock was released by the holding transaction. This fix is due to great work by Rich Prohaska. This problem is actually not specific to replication, normal transactions on a master will experience it too. But it hurts replication a lot, since replication must commit transactions in order, so one stalled transaction stalls all following transactions as well. 2. Implement the conflict detection and handling necessary for optimistic parallel replication to work. This basically implements thd_rpl_deadlock_check() and hton->kill_query methods. This should solve the problems where optimistic parallel replication with TokuDB breaks with lock wait timeouts. If someone wants to test it, I have made a tree available with just these fixes on top of MariaDB 10.1.19: https://github.com/knielsen/server/tree/tokudb_optimistic_parallel_replicati... The fix should appear in 10.1.20 eventually. The first part of the patch has also been submitted by Rich to upstream. When this is (hopefully) merged upstream, and upstream merged into MariaDB, the MariaDB version of the fix should be replaced with the Percona one. I tried making the MariaDB version of the fix identical to Rich's pull request and keeping it in a separate commit, so this should hopefully be simple to do when the time comes. - Kristian.
Hi Kristian! Thanks for the fix, I should be able to give it a try sometimes this week. Jocelyn Le 28/11/2016 à 10:10, Kristian Nielsen a écrit :
Parallel replication so far did not work well with TokuDB, as some people who tried it found out. I have now pushed to 10.1 some patches to solve the problems. There are two main fixes:
1. Fix some races where a waiting transaction would miss its wakeup and get a lock timeout on a waiting row lock, even though the lock was released by the holding transaction. This fix is due to great work by Rich Prohaska. This problem is actually not specific to replication, normal transactions on a master will experience it too. But it hurts replication a lot, since replication must commit transactions in order, so one stalled transaction stalls all following transactions as well.
2. Implement the conflict detection and handling necessary for optimistic parallel replication to work. This basically implements thd_rpl_deadlock_check() and hton->kill_query methods. This should solve the problems where optimistic parallel replication with TokuDB breaks with lock wait timeouts.
If someone wants to test it, I have made a tree available with just these fixes on top of MariaDB 10.1.19:
https://github.com/knielsen/server/tree/tokudb_optimistic_parallel_replicati...
The fix should appear in 10.1.20 eventually.
The first part of the patch has also been submitted by Rich to upstream. When this is (hopefully) merged upstream, and upstream merged into MariaDB, the MariaDB version of the fix should be replaced with the Percona one. I tried making the MariaDB version of the fix identical to Rich's pull request and keeping it in a separate commit, so this should hopefully be simple to do when the time comes.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Hi Kristian, I've just tried your tokudb_optimistic_parallel_replication branch, and it behaves very strangely: the SQL thread stop by itself without any replication error when the parallel_mode is set to optimistic. In the error.log : 2016-12-09 11:48:05 140259564768000 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482499150' 2016-12-09 11:48:05 140259546905344 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 973807119, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482499150' 2016-12-09 11:50:29 140259546905344 [Note] Error reading relay log event: slave SQL thread was killed 2016-12-09 11:50:29 140259564768000 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001443', position 1041861449; GTID position 0-1-482509090 2016-12-09 11:50:43 140259546905344 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482509090' 2016-12-09 11:50:43 140259548117760 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1041861449, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482509090' 2016-12-09 11:50:45 140259548117760 [Note] Error reading relay log event: slave SQL thread was killed 2016-12-09 11:52:03 140259547511552 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1046751490, relay log './sql-slave-relay-bin.000002' position: 4890750; GTID position '0-1-482509269' 2016-12-09 11:52:05 140259547511552 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.001443' at position 1047219058; GTID position '0-1-482509775' Switching back to conservative mode, and all is working properly. Any idea what could be wrong? Thanks! Jocelyn Fournier Founder M : +33 6 51 21 54 10 https://www.softizy.com Softizy - At your side to Optimize your PHP / MySQL applications Le 28/11/2016 à 10:10, Kristian Nielsen a écrit :
Parallel replication so far did not work well with TokuDB, as some people who tried it found out. I have now pushed to 10.1 some patches to solve the problems. There are two main fixes:
1. Fix some races where a waiting transaction would miss its wakeup and get a lock timeout on a waiting row lock, even though the lock was released by the holding transaction. This fix is due to great work by Rich Prohaska. This problem is actually not specific to replication, normal transactions on a master will experience it too. But it hurts replication a lot, since replication must commit transactions in order, so one stalled transaction stalls all following transactions as well.
2. Implement the conflict detection and handling necessary for optimistic parallel replication to work. This basically implements thd_rpl_deadlock_check() and hton->kill_query methods. This should solve the problems where optimistic parallel replication with TokuDB breaks with lock wait timeouts.
If someone wants to test it, I have made a tree available with just these fixes on top of MariaDB 10.1.19:
https://github.com/knielsen/server/tree/tokudb_optimistic_parallel_replicati...
The fix should appear in 10.1.20 eventually.
The first part of the patch has also been submitted by Rich to upstream. When this is (hopefully) merged upstream, and upstream merged into MariaDB, the MariaDB version of the fix should be replaced with the Percona one. I tried making the MariaDB version of the fix identical to Rich's pull request and keeping it in a separate commit, so this should hopefully be simple to do when the time comes.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
jocelyn fournier <jocelyn.fournier@softizy.com> writes:
I've just tried your tokudb_optimistic_parallel_replication branch, and it behaves very strangely: the SQL thread stop by itself without any replication error when the parallel_mode is set to optimistic.
That's strange, the log looks like a normal STOP SLAVE. Do you have a way I could reproduce the issue? - Kristian.
Actually it's normal stop slave. The first one is when I shutted down the server and switched to the new version. The second one when I stop slave & set global slave_parallel_mode=optimistic; and the last one when I put back the conservative mode. So there's actually nothing in the log related to the Slave_SQL_Running stopping by itself, while the Slave_IO_Running is still set to Yes. Jocelyn Fournier Founder M : +33 6 51 21 54 10 https://www.softizy.com Softizy - At your side to Optimize your PHP / MySQL applications Le 09/12/2016 à 12:51, Kristian Nielsen a écrit :
jocelyn fournier <jocelyn.fournier@softizy.com> writes:
I've just tried your tokudb_optimistic_parallel_replication branch, and it behaves very strangely: the SQL thread stop by itself without any replication error when the parallel_mode is set to optimistic. That's strange, the log looks like a normal STOP SLAVE. Do you have a way I could reproduce the issue?
- Kristian.
Ok, to make it clear, here are the actions related to the log entries : *** Server startup : *** 2016-12-09 11:48:05 140259564768000 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482499150' 2016-12-09 11:48:05 140259546905344 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 973807119, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482499150' stop slave; set global slave_parallel_mode=optimistic; 2016-12-09 11:50:29 140259546905344 [Note] Error reading relay log event: slave SQL thread was killed 2016-12-09 11:50:29 140259564768000 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001443', position 1041861449; GTID position 0-1-482509090 *** start slave; *** 2016-12-09 11:50:43 140259546905344 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482509090' 2016-12-09 11:50:43 140259548117760 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1041861449, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482509090' *** SQL thread stopping by itself : *** 2016-12-09 11:50:45 140259548117760 [Note] Error reading relay log event: slave SQL thread was killed (no message "Slave SQL thread exiting...") Associated show slave status : MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.4.1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.001443 Read_Master_Log_Pos: 1047027234 Relay_Log_File: sql-slave-relay-bin.000002 Relay_Log_Pos: 4890750 Relay_Master_Log_File: mysql-bin.001443 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: sc_2,percona Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1046751490 Relay_Log_Space: 5166796 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-1-482509567 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: optimistic 1 row in set (0.00 sec) *** start slave; *** 2016-12-09 11:52:03 140259547511552 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1046751490, relay log './sql-slave-relay-bin.000002' position: 4890750; GTID position '0-1-482509269' *** SQL thread stopping by itself *** This time no slave "SQL thread was killed" message. 2016-12-09 11:52:05 140259547511552 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.001443' at position 1047219058; GTID position '0-1-482509775' MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.4.1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.001444 Read_Master_Log_Pos: 6700262 Relay_Log_File: sql-slave-relay-bin.000002 Relay_Log_Pos: 5358318 Relay_Master_Log_File: mysql-bin.001443 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: sc_2,percona Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1047219058 Relay_Log_Space: 38586524 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-1-482514272 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: optimistic 1 row in set (0.00 sec) HTH, Jocelyn Fournier Founder M : +33 6 51 21 54 10 https://www.softizy.com Softizy - At your side to Optimize your PHP / MySQL applications Le 09/12/2016 à 12:01, jocelyn fournier a écrit :
Hi Kristian,
I've just tried your tokudb_optimistic_parallel_replication branch, and it behaves very strangely: the SQL thread stop by itself without any replication error when the parallel_mode is set to optimistic.
In the error.log :
2016-12-09 11:48:05 140259564768000 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482499150' 2016-12-09 11:48:05 140259546905344 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 973807119, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482499150' 2016-12-09 11:50:29 140259546905344 [Note] Error reading relay log event: slave SQL thread was killed 2016-12-09 11:50:29 140259564768000 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001443', position 1041861449; GTID position 0-1-482509090 2016-12-09 11:50:43 140259546905344 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482509090' 2016-12-09 11:50:43 140259548117760 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1041861449, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482509090' 2016-12-09 11:50:45 140259548117760 [Note] Error reading relay log event: slave SQL thread was killed 2016-12-09 11:52:03 140259547511552 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1046751490, relay log './sql-slave-relay-bin.000002' position: 4890750; GTID position '0-1-482509269' 2016-12-09 11:52:05 140259547511552 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.001443' at position 1047219058; GTID position '0-1-482509775'
Switching back to conservative mode, and all is working properly. Any idea what could be wrong?
Thanks!
Jocelyn Fournier Founder M : +33 6 51 21 54 10 https://www.softizy.com Softizy - At your side to Optimize your PHP / MySQL applications
Le 28/11/2016 à 10:10, Kristian Nielsen a écrit :
Parallel replication so far did not work well with TokuDB, as some people who tried it found out. I have now pushed to 10.1 some patches to solve the problems. There are two main fixes:
1. Fix some races where a waiting transaction would miss its wakeup and get a lock timeout on a waiting row lock, even though the lock was released by the holding transaction. This fix is due to great work by Rich Prohaska. This problem is actually not specific to replication, normal transactions on a master will experience it too. But it hurts replication a lot, since replication must commit transactions in order, so one stalled transaction stalls all following transactions as well.
2. Implement the conflict detection and handling necessary for optimistic parallel replication to work. This basically implements thd_rpl_deadlock_check() and hton->kill_query methods. This should solve the problems where optimistic parallel replication with TokuDB breaks with lock wait timeouts.
If someone wants to test it, I have made a tree available with just these fixes on top of MariaDB 10.1.19:
https://github.com/knielsen/server/tree/tokudb_optimistic_parallel_replicati...
The fix should appear in 10.1.20 eventually.
The first part of the patch has also been submitted by Rich to upstream. When this is (hopefully) merged upstream, and upstream merged into MariaDB, the MariaDB version of the fix should be replaced with the Percona one. I tried making the MariaDB version of the fix identical to Rich's pull request and keeping it in a separate commit, so this should hopefully be simple to do when the time comes.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
I've just check what happened at position 1046751490 in the binlog. It seems the entry corresponds to a table having a trigger, don't know if it could have an impact on the parallel replication ? /*!100001 SET @@session.gtid_seq_no=482509269*//*!*/; BEGIN /*!*/; # at 1046751112 #161209 11:50:35 server id 1 end_log_pos 1046751203 Table_map: `sc_2`.`sc_feed` mapped to number 93614 (has triggers) # at 1046751203 #161209 11:50:35 server id 1 end_log_pos 1046751299 Table_map: `sc_2`.`_sc_feed_new` mapped to number 197 # at 1046751299 # at 1046751381 #161209 11:50:35 server id 1 end_log_pos 1046751381 Write_rows: table id 93614 #161209 11:50:35 server id 1 end_log_pos 1046751463 Write_rows: table id 197 flags: STMT_END_F BINLOG ' e4xKWBMBAAAAWwAAAOMnZD4AAK5tAQAAAAFABHNjXzIAB3NjX2ZlZWQAJAMDAwMDAwEBAQEBAQEB AQ8DAQEBAwMDAwMDAwEDAwMDEREBDwYsAQAA/QL8///3CQ== e4xKWBMBAAAAYAAAAEMoZD4AAMUAAAAAAAEABHNjXzIADF9zY19mZWVkX25ldwAkAwMDAwMDAQEB AQEBAQEBDwMBAQEDAwMDAwMDAQMDAwMREQEPBiwBAAD9Avz///cJ e4xKWBcBAAAAUgAAAJUoZD4AAK5tAQAAAAAAJP//////FPLf9/AwWw0HuAsGAF0SDQAXMboGBggH AAAAAAAAAFhKjHtYSox7AQcAY3Jvbi0wMQ== ### INSERT INTO `sc_2`.`sc_feed` ### SET ### @1=118315824 /* INT meta=0 nullable=0 is_null=0 */ ### @2=396216 /* INT meta=0 nullable=0 is_null=0 */ ### @3=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @4=856669 /* INT meta=0 nullable=1 is_null=0 */ ### @5=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @6=112865559 /* INT meta=0 nullable=1 is_null=0 */ ### @7=6 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @8=8 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @9=7 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @10=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @11=0 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @12=0 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @13=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @14=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @15=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @16=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */ ### @17=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @18=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @19=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @20=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @21=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @22=0 /* INT meta=0 nullable=1 is_null=0 */ ### @23=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @24=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @25=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @26=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @27=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @28=0 /* TINYINT meta=0 nullable=0 is_null=0 */ ### @29=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @30=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @31=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @32=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @33=1481280635 /* TIMESTAMP(0) meta=0 nullable=1 is_null=0 */ ### @34=1481280635 /* TIMESTAMP(0) meta=0 nullable=0 is_null=0 */ ### @35=1 /* TINYINT meta=0 nullable=0 is_null=0 */ ### @36='cron-01' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */ e4xKWBcBAAAAUgAAAOcoZD4AAMUAAAAAAAEAJP//////FPLf9/AwWw0HuAsGAF0SDQAXMboGBggH AAAAAAAAAFhKjHtYSox7AQcAY3Jvbi0wMQ== '/*!*/; ### INSERT INTO `sc_2`.`_sc_feed_new` ### SET ### @1=118315824 /* INT meta=0 nullable=0 is_null=0 */ ### @2=396216 /* INT meta=0 nullable=0 is_null=0 */ ### @3=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @4=856669 /* INT meta=0 nullable=1 is_null=0 */ ### @5=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @6=112865559 /* INT meta=0 nullable=1 is_null=0 */ ### @7=6 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @8=8 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @9=7 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @10=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @11=0 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @12=0 /* TINYINT meta=0 nullable=1 is_null=0 */ ### @13=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @14=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @15=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @16=NULL /* VARSTRING(300) meta=300 nullable=1 is_null=1 */ ### @17=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @18=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @19=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @20=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ ### @21=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @22=0 /* INT meta=0 nullable=1 is_null=0 */ ### @23=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @24=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @25=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @26=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @27=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @28=0 /* TINYINT meta=0 nullable=0 is_null=0 */ ### @29=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @30=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @31=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @32=NULL /* INT meta=0 nullable=1 is_null=1 */ ### @33=1481280635 /* TIMESTAMP(0) meta=0 nullable=1 is_null=0 */ ### @34=1481280635 /* TIMESTAMP(0) meta=0 nullable=0 is_null=0 */ ### @35=1 /* TINYINT meta=0 nullable=0 is_null=0 */ ### @36='cron-01' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */ # at 1046751463 #161209 11:50:35 server id 1 end_log_pos 1046751490 Xid = 9097980179 COMMIT/*!*/; # at 1046751490 SHOW CREATE TRIGGER pt_osc_sc_2_sc_feed_ins\G *************************** 1. row *************************** Trigger: pt_osc_sc_2_sc_feed_ins sql_mode: NO_AUTO_VALUE_ON_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION SQL Original Statement: CREATE DEFINER=`root`@`localhost` TRIGGER `pt_osc_sc_2_sc_feed_ins` AFTER INSERT ON `sc_2`.`sc_feed` FOR EACH ROW REPLACE INTO `sc_2`.`_sc_feed_new` (`id`, `user_id`, `user_id_scout`, `product_id`, `contact_id`, `collection_id`, `type_id`, `subtype_id`, `rating`, `is_shopping_list`, `is_wish_list`, `is_wish_list_bis`, `is_recommend`, `is_review`, `is_current`, `list_id_product`, `list_id_list`, `is_like`, `is_comment_list`, `is_comment_review`, `feed_id_related`, `format_id`, `badge_id`, `param_index_id_trailer`, `message_id`, `shout_id`, `forum_topic_id`, `is_archive`, `application_id`, `gen_like_count`, `gen_comment_count`, `gen_message_topic_id`, `date_creation`, `date_last_update`, `flag_gen`, `host`) VALUES (NEW.`id`, NEW.`user_id`, NEW.`user_id_scout`, NEW.`product_id`, NEW.`contact_id`, NEW.`collection_id`, NEW.`type_id`, NEW.`subtype_id`, NEW.`rating`, NEW.`is_shopping_list`, NEW.`is_wish_list`, NEW.`is_wish_list_bis`, NEW.`is_recommend`, NEW.`is_review`, NEW.`is_current`, NEW.`list_id_product`, NEW.`list_id_list`, NEW.`is_like`, NEW.`is_comment_list`, NEW.`is_comment_review`, NEW.`feed_id_related`, NEW.`format_id`, NEW.`badge_id`, NEW.`param_index_id_trailer`, NEW.`message_id`, NEW.`shout_id`, NEW.`forum_topic_id`, NEW.`is_archive`, NEW.`application_id`, NEW.`gen_like_count`, NEW.`gen_comment_count`, NEW.`gen_message_topic_id`, NEW.`date_creation`, NEW.`date_last_update`, NEW.`flag_gen`, NEW.`host`) character_set_client: latin1 collation_connection: latin1_swedish_ci Database Collation: utf8_general_ci 1 row in set (0.00 sec) Jocelyn Fournier Founder M : +33 6 51 21 54 10 https://www.softizy.com Softizy - At your side to Optimize your PHP / MySQL applications Le 09/12/2016 à 13:29, jocelyn fournier a écrit :
Ok, to make it clear, here are the actions related to the log entries :
*** Server startup : ***
2016-12-09 11:48:05 140259564768000 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482499150' 2016-12-09 11:48:05 140259546905344 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 973807119, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482499150'
stop slave; set global slave_parallel_mode=optimistic;
2016-12-09 11:50:29 140259546905344 [Note] Error reading relay log event: slave SQL thread was killed 2016-12-09 11:50:29 140259564768000 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001443', position 1041861449; GTID position 0-1-482509090
*** start slave; ***
2016-12-09 11:50:43 140259546905344 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482509090' 2016-12-09 11:50:43 140259548117760 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1041861449, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482509090'
*** SQL thread stopping by itself : ***
2016-12-09 11:50:45 140259548117760 [Note] Error reading relay log event: slave SQL thread was killed
(no message "Slave SQL thread exiting...")
Associated show slave status :
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.4.1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.001443 Read_Master_Log_Pos: 1047027234 Relay_Log_File: sql-slave-relay-bin.000002 Relay_Log_Pos: 4890750 Relay_Master_Log_File: mysql-bin.001443 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: sc_2,percona Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1046751490 Relay_Log_Space: 5166796 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-1-482509567 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: optimistic 1 row in set (0.00 sec)
*** start slave; ***
2016-12-09 11:52:03 140259547511552 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1046751490, relay log './sql-slave-relay-bin.000002' position: 4890750; GTID position '0-1-482509269'
*** SQL thread stopping by itself ***
This time no slave "SQL thread was killed" message.
2016-12-09 11:52:05 140259547511552 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.001443' at position 1047219058; GTID position '0-1-482509775'
MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.4.1 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.001444 Read_Master_Log_Pos: 6700262 Relay_Log_File: sql-slave-relay-bin.000002 Relay_Log_Pos: 5358318 Relay_Master_Log_File: mysql-bin.001443 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: sc_2,percona Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1047219058 Relay_Log_Space: 38586524 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-1-482514272 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: optimistic 1 row in set (0.00 sec)
HTH,
Jocelyn Fournier Founder M : +33 6 51 21 54 10 https://www.softizy.com Softizy - At your side to Optimize your PHP / MySQL applications Le 09/12/2016 à 12:01, jocelyn fournier a écrit :
Hi Kristian,
I've just tried your tokudb_optimistic_parallel_replication branch, and it behaves very strangely: the SQL thread stop by itself without any replication error when the parallel_mode is set to optimistic.
In the error.log :
2016-12-09 11:48:05 140259564768000 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482499150' 2016-12-09 11:48:05 140259546905344 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 973807119, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482499150' 2016-12-09 11:50:29 140259546905344 [Note] Error reading relay log event: slave SQL thread was killed 2016-12-09 11:50:29 140259564768000 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001443', position 1041861449; GTID position 0-1-482509090 2016-12-09 11:50:43 140259546905344 [Note] Slave I/O thread: connected to master 'repl@172.16.4.1:3306',replication starts at GTID position '0-1-482509090' 2016-12-09 11:50:43 140259548117760 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1041861449, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-482509090' 2016-12-09 11:50:45 140259548117760 [Note] Error reading relay log event: slave SQL thread was killed 2016-12-09 11:52:03 140259547511552 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001443' at position 1046751490, relay log './sql-slave-relay-bin.000002' position: 4890750; GTID position '0-1-482509269' 2016-12-09 11:52:05 140259547511552 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.001443' at position 1047219058; GTID position '0-1-482509775'
Switching back to conservative mode, and all is working properly. Any idea what could be wrong?
Thanks!
Jocelyn Fournier Founder M : +33 6 51 21 54 10 https://www.softizy.com Softizy - At your side to Optimize your PHP / MySQL applications
Le 28/11/2016 à 10:10, Kristian Nielsen a écrit :
Parallel replication so far did not work well with TokuDB, as some people who tried it found out. I have now pushed to 10.1 some patches to solve the problems. There are two main fixes:
1. Fix some races where a waiting transaction would miss its wakeup and get a lock timeout on a waiting row lock, even though the lock was released by the holding transaction. This fix is due to great work by Rich Prohaska. This problem is actually not specific to replication, normal transactions on a master will experience it too. But it hurts replication a lot, since replication must commit transactions in order, so one stalled transaction stalls all following transactions as well.
2. Implement the conflict detection and handling necessary for optimistic parallel replication to work. This basically implements thd_rpl_deadlock_check() and hton->kill_query methods. This should solve the problems where optimistic parallel replication with TokuDB breaks with lock wait timeouts.
If someone wants to test it, I have made a tree available with just these fixes on top of MariaDB 10.1.19:
https://github.com/knielsen/server/tree/tokudb_optimistic_parallel_replicati...
The fix should appear in 10.1.20 eventually.
The first part of the patch has also been submitted by Rich to upstream. When this is (hopefully) merged upstream, and upstream merged into MariaDB, the MariaDB version of the fix should be replaced with the Percona one. I tried making the MariaDB version of the fix identical to Rich's pull request and keeping it in a separate commit, so this should hopefully be simple to do when the time comes.
- Kristian.
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
jocelyn fournier <jocelyn.fournier@softizy.com> writes:
It seems the entry corresponds to a table having a trigger, don't know if it could have an impact on the parallel replication ?
Triggers should work ok, otherwise it would be a bug. But it's hard to tell what is going on without a way to reproduce... I don't recall seeing before the SQL "stopping by itself". Even if parallel replication worker threads would somehow end up killing the SQL thread instead of a conflicting parallel transaction, I would expect something in the log... - Kristian.
When running the server in debug, I hit the following assert : 2016-12-09 18:12:03 140057763633920 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001445' at position 4, relay log './sql-slave-relay-bin.000001' position: 4; GTID position '0-1-483188364' mysqld: /home/jfournier/server/sql/rpl_parallel.cc:210: void finish_event_group(rpl_parallel_thread*, uint64, rpl_parallel_entry*, rpl_group_info*): Assertion `!tmp_gco->next_gco || tmp_gco->last_sub_id
sub_id' failed. 161209 18:12:06 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.19-MariaDB-debug key_buffer_size=0 read_buffer_size=2097152 max_used_connections=0 max_threads=1502 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6183865 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0x7f60fc016070 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f61e08f61b8 thread_stack 0x48400 /usr/local/mysql/bin/mysqld(my_print_stacktrace+0x29)[0xd349f9] /usr/local/mysql/bin/mysqld(handle_fatal_signal+0x2dd)[0x7b9f4d] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f61e5e2d330] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f61e5484c37] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f61e5488028] /lib/x86_64-linux-gnu/libc.so.6(+0x2fbf6)[0x7f61e547dbf6] /lib/x86_64-linux-gnu/libc.so.6(+0x2fca2)[0x7f61e547dca2] /usr/local/mysql/bin/mysqld[0x745b63] /usr/local/mysql/bin/mysqld(handle_rpl_parallel_thread+0x685)[0x7468b5] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f61e5e25184] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f61e554837d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x0): is an invalid pointer Connection ID (thread ID): 5 Status: NOT_KILLED HTH, Jocelyn Fournier Founder M : +33 6 51 21 54 10 https://www.softizy.com Softizy - At your side to Optimize your PHP / MySQL applications Le 09/12/2016 à 15:16, Kristian Nielsen a écrit :
jocelyn fournier <jocelyn.fournier@softizy.com> writes:
It seems the entry corresponds to a table having a trigger, don't know if it could have an impact on the parallel replication ? Triggers should work ok, otherwise it would be a bug.
But it's hard to tell what is going on without a way to reproduce... I don't recall seeing before the SQL "stopping by itself". Even if parallel replication worker threads would somehow end up killing the SQL thread instead of a conflicting parallel transaction, I would expect something in the log...
- Kristian.
jocelyn fournier <jocelyn.fournier@softizy.com> writes:
When running the server in debug, I hit the following assert :
mysqld: /home/jfournier/server/sql/rpl_parallel.cc:210: void finish_event_group(rpl_parallel_thread*, uint64, rpl_parallel_entry*, rpl_group_info*): Assertion `!tmp_gco->next_gco || tmp_gco->last_sub_id
sub_id' failed.
Ok, that could be useful info, I'll look into it. Thanks, - Kristian.
participants (3)
-
jocelyn fournier
-
jocelyn fournier
-
Kristian Nielsen