[Maria-developers] some bugs in dingqing parallel replication
Hi,guys I have worked on this branch https://code.launchpad.net/~knielsen/maria/dingqi-parallel-replication for some days, and found bugs listed below.May this would be helpful to you. 1, when slave switch on table filter,this bug could lead server crash. how to reappear: on slave set replicate-wild-ignore-table = test.t5 in config file on master do these operations CREATE TABLE test.t3 (a INT AUTO_INCREMENT PRIMARY KEY, b DECIMAL(20,20), c INT); SET INSERT_ID=1; SET @c=2; SET @@rand_seed1=10000000, @@rand_seed2=1000000; INSERT INTO t3 VALUES (NULL, RAND(), @c); codes lead this bug: In execute_single_transaction() case RAND_EVENT: need_remove_from_trans= true; if(!rli->is_deferred_event(ev)) delete ev; break; reason: Rand Event object is deleted in execute_single_transaction(), but it's pointer would be used is slave_execute_deferred_events() later. 2, SQL thread could read and apply some log events repeated. how to reappear: it's a little hard to reappear. if you set max_relay_log_size=100M and keep SQL thread closed to IO thread, this bug may reappear. codes lead this bug: In reopen_relay_log() rli->event_relay_log_pos= max(rli->event_relay_log_pos, BIN_LOG_HEADER_SIZE); my_b_seek(cur_log,rli->event_relay_log_pos); reason: when SQL thread use a hot log,but the hot log was closed by IO thread just recently, SQL thread need to reopen this log and set read offset to rli->event_relay_log_pos, while rli->event_relay_log_pos could be set new value in other thread for there are many threads apply log events.so rli->event_relay_log_pos could be less then rli->future_event_relay_log_pos. 3, SQL thread do not report error information in result of "show slave status"and replication do not stop, when the slave insert duplicate record into a table with primary key. how to reappear: Just need to change master_log_pos to read duplicate records from master. codes lead this bug: In execute_single_transaction() retry_transaction: ev= trans->event_list_head; ... ... if (ret && rli->trans_retries < slave_trans_retries) { ... goto retry_transaction; } reason: as I have sayed in other email: Rows_log_event::do_apply_event() do twice but return different results for m_curr_row==m_rows_end in the second time. 4, when do oparetions such as "show slave status" and "stop slave", it could be blocked for a long time. how to reappear: just do "show slave status" again and again. codes lead this bug: In the queue_event() case FORMAT_DESCRIPTION_EVENT: ... wait_for_all_dml_done(&mi->rli, true); and in process_io_rotate() wait_for_all_dml_done(&mi->rli, true); reason: IO thread could wait in wait_for_all_dml_done() while holding the rpl_mi->data_lock, so operations like "show slave status" could be blocked for waiting rpl_mi->data_lock. 5, "START SLAVE UNTIL" make replication stop in different place. how to reappear: suppose log events in relay log like: BEGIN; ------->pos1 LOG_EVENT1; LOG_EVENT2; COMMIT; ------>pos2 BEGIN; ------>pos3 LOG_EVENT3; --->stop_pos LOG_EVENT4; COMMIT; ------>pos4 If we do START SLAVE UNTIL relay_log_pos=stop_pos; The replication should stop at pos4 but it stop pos2. 6, log_event->thd is wrong. suppose log_event was read is thread_1 so the log_event->thd==thread_1, but this log_event may be dispatch to other thread (suppose thread_2).the log_event is applyed in thread_2 but the log_event->thd==thread_1.this problem can make log event apply failed in MySQL, but in mariaDB it seems ok. 2013-07-19 nanyi607rao
nanyi607rao <nanyi607rao@gmail.com> writes:
1, when slave switch on table filter,this bug could lead server crash.
how to reappear: on slave set replicate-wild-ignore-table = test.t5 in config file on master do these operations CREATE TABLE test.t3 (a INT AUTO_INCREMENT PRIMARY KEY, b DECIMAL(20,20), c INT); SET INSERT_ID=1; SET @c=2; SET @@rand_seed1=10000000, @@rand_seed2=1000000; INSERT INTO t3 VALUES (NULL, RAND(), @c);
codes lead this bug: In execute_single_transaction() case RAND_EVENT: need_remove_from_trans= true; if(!rli->is_deferred_event(ev)) delete ev; break; reason: Rand Event object is deleted in execute_single_transaction(), but it's pointer would be used is slave_execute_deferred_events() later.
This sounds like a bug we fixed in Percona Server a while ago. filtered replication is kinda awful and when we started to poke at it, it pretty much never worked properly. It may be that this is not specific to this replication patch. -- Stewart Smith
participants (2)
-
nanyi607rao
-
Stewart Smith